Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with Generator
#1
Hi all

I am trying to learn about generators. The code below is showing some strange behaviours (the aim is to calculate the average raised Amt for each company in round "a"):

techcruncher.csv:

Output:
permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round digg,Digg,60,web,San Francisco,CA,01-Dec-06,8500000,USD,b digg,Digg,60,web,San Francisco,CA,01-Oct-05,2800000,USD,a facebook,Facebook,450,web,Palo Alto,CA,01-Sep-04,500000,USD,angel facebook,Facebook,450,web,Palo Alto,CA,01-May-05,12700000,USD,a photobucket,Photobucket,60,web,Palo Alto,CA,01-Mar-05,3000000,USD,a
file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
unique_companies = (
    company_dict["company"]
    for company_dict in company_dicts
    if company_dict["round"] == ("a".lower())
)
number_of_companies = len(list(unique_companies))
print(number_of_companies)
I get the output below, which is what I expect.

Output:
PS D:\python> python generatorlearn.py 3
However, when I change the code to this:

file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
funding = (
    int(company_dict["raisedAmt"])
    for company_dict in company_dicts
    if company_dict["round"] == ('A'.lower())
)
total_series_a = sum(funding)
print(total_series_a)
unique_companies = (
    company_dict["company"]
    for company_dict in company_dicts
    if company_dict["round"] == ("a".lower())
)
number_of_companies = len(list(unique_companies))
print(number_of_companies)
(new lines are 6-12 inclusive), I get the following output:

Output:
PS D:\python> python generatorlearn.py 18500000 0
Somehow the generator unique_companies do not seem to work anymore.
Would appreciate if someone could explain why is this the case.
Thanks in advance!
Reply
#2
company_dicts is generator and you can iterate over it only once. after iterating over it for the finding the generator is exhausted.
By the way you are doing a lot of redundunt stuff here...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thanks for the tip.

I tried the following and it works. Hopefully there's no redundant stuff here:

file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
z = list(company_dicts)
funding = (
        int(item["raisedAmt"])
        for item in z
        if item["round"] == 'a')
x = list(funding)
print(sum(x)/len(x))
Reply
#4
(Feb-13-2020, 02:37 AM)palladium Wrote: Hopefully there's no redundant stuff here
Given that you want to learn about generators, I will keep your approach although there are better ways to do this.

1. you can combine line 2 and 3 into one generator expression.
2. it's better to use with context manager to open the file
3. line 5 can be list directly, instead of making it generator expression and immediately make it list on the next line
4. same for lines 7-11

you may want to look at cav.DictReader module
and you can iterate over the file handler directly, instead of creating a generator

import csv
with open("techcruncher.csv") as f:
    rdr = csv.DictReader(f)
    round_a_funding = [int(item["raisedAmt"]) for item in rdr if item["round"] == 'a']

print(f'Average Round A funding is {sum(round_a_funding)/len(round_a_funding):.2f}')
as an alternative you may look at Pandas to process the data as dataframe.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
You can use the module statistics.

Buran's code extended.
Instead of list comprehension you should use a for-loop.
The cause is, that you can't catch Exceptions inside a list comprehension.
In the case if one value is not an integer, you get nothing back.

With pandas this should be easier, but I don't use it.


import csv
import statistics


def get_raisedAmt(file, round):
    """
    Generator which yields the column 'raisedAmt'
    from selected round.
    """
    with open(file) as fd:
        rdr = csv.DictReader(fd)
        # each iteration of DictReader
        # yields an dictionary
        # the column header is parsed automatically
        for item in rdr:
            if item["round"] == round:
                try:
                    value = int(item["raisedAmt"])
                except ValueError:
                    continue
                else:
                    yield value


# If you want to reuse the yielded values,
# use tuple, list or other Type you want
my_raised_amt = list(get_raisedAmt("techcruncher.csv", "a"))


# If you know before, that you don't need the original values
# you let statistics.mean or statistics.median consume the generator
# If you have many rows, this saves a lot of memory
my_mean = statistics.mean(get_raisedAmt("techcruncher.csv", "a"))
print("my_mean:", my_mean)

# Now a little bit statistics
mean = statistics.mean(my_raised_amt)
median = statistics.median(my_raised_amt)
print("Mean:", mean)
print("Median:", median)

# there is also a faster method: statistics.fmean
fmean = statistics.fmean(my_raised_amt)
print("fmean:", fmean)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
Thanks Buran and DeadEye for your input, much appreciated.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  problem in using generator akbarza 2 434 Nov-07-2023, 08:40 AM
Last Post: perfringo
  list call problem in generator function using iteration and recursive calls postta 1 1,862 Oct-24-2020, 09:33 PM
Last Post: bowlofred
  receive from a generator, send to a generator Skaperen 9 5,422 Feb-05-2018, 06:26 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020