Python Forum
Problem with Generator - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Problem with Generator (/thread-24370.html)



Problem with Generator - palladium - Feb-11-2020

Hi all

I am trying to learn about generators. The code below is showing some strange behaviours (the aim is to calculate the average raised Amt for each company in round "a"):

techcruncher.csv:

Output:
permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round digg,Digg,60,web,San Francisco,CA,01-Dec-06,8500000,USD,b digg,Digg,60,web,San Francisco,CA,01-Oct-05,2800000,USD,a facebook,Facebook,450,web,Palo Alto,CA,01-Sep-04,500000,USD,angel facebook,Facebook,450,web,Palo Alto,CA,01-May-05,12700000,USD,a photobucket,Photobucket,60,web,Palo Alto,CA,01-Mar-05,3000000,USD,a
file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
unique_companies = (
    company_dict["company"]
    for company_dict in company_dicts
    if company_dict["round"] == ("a".lower())
)
number_of_companies = len(list(unique_companies))
print(number_of_companies)
I get the output below, which is what I expect.

Output:
PS D:\python> python generatorlearn.py 3
However, when I change the code to this:

file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
funding = (
    int(company_dict["raisedAmt"])
    for company_dict in company_dicts
    if company_dict["round"] == ('A'.lower())
)
total_series_a = sum(funding)
print(total_series_a)
unique_companies = (
    company_dict["company"]
    for company_dict in company_dicts
    if company_dict["round"] == ("a".lower())
)
number_of_companies = len(list(unique_companies))
print(number_of_companies)
(new lines are 6-12 inclusive), I get the following output:

Output:
PS D:\python> python generatorlearn.py 18500000 0
Somehow the generator unique_companies do not seem to work anymore.
Would appreciate if someone could explain why is this the case.
Thanks in advance!


RE: Problem with Generator - buran - Feb-11-2020

company_dicts is generator and you can iterate over it only once. after iterating over it for the finding the generator is exhausted.
By the way you are doing a lot of redundunt stuff here...


RE: Problem with Generator - palladium - Feb-13-2020

Thanks for the tip.

I tried the following and it works. Hopefully there's no redundant stuff here:

file_name = "techcruncher.csv"
lines = (line for line in open(file_name))
list_line = (s.rstrip().split(",") for s in lines)
cols = next(list_line)
company_dicts = (dict(zip(cols,data)) for data in list_line)
z = list(company_dicts)
funding = (
        int(item["raisedAmt"])
        for item in z
        if item["round"] == 'a')
x = list(funding)
print(sum(x)/len(x))



RE: Problem with Generator - buran - Feb-13-2020

(Feb-13-2020, 02:37 AM)palladium Wrote: Hopefully there's no redundant stuff here
Given that you want to learn about generators, I will keep your approach although there are better ways to do this.

1. you can combine line 2 and 3 into one generator expression.
2. it's better to use with context manager to open the file
3. line 5 can be list directly, instead of making it generator expression and immediately make it list on the next line
4. same for lines 7-11

you may want to look at cav.DictReader module
and you can iterate over the file handler directly, instead of creating a generator

import csv
with open("techcruncher.csv") as f:
    rdr = csv.DictReader(f)
    round_a_funding = [int(item["raisedAmt"]) for item in rdr if item["round"] == 'a']

print(f'Average Round A funding is {sum(round_a_funding)/len(round_a_funding):.2f}')
as an alternative you may look at Pandas to process the data as dataframe.


RE: Problem with Generator - DeaD_EyE - Feb-13-2020

You can use the module statistics.

Buran's code extended.
Instead of list comprehension you should use a for-loop.
The cause is, that you can't catch Exceptions inside a list comprehension.
In the case if one value is not an integer, you get nothing back.

With pandas this should be easier, but I don't use it.


import csv
import statistics


def get_raisedAmt(file, round):
    """
    Generator which yields the column 'raisedAmt'
    from selected round.
    """
    with open(file) as fd:
        rdr = csv.DictReader(fd)
        # each iteration of DictReader
        # yields an dictionary
        # the column header is parsed automatically
        for item in rdr:
            if item["round"] == round:
                try:
                    value = int(item["raisedAmt"])
                except ValueError:
                    continue
                else:
                    yield value


# If you want to reuse the yielded values,
# use tuple, list or other Type you want
my_raised_amt = list(get_raisedAmt("techcruncher.csv", "a"))


# If you know before, that you don't need the original values
# you let statistics.mean or statistics.median consume the generator
# If you have many rows, this saves a lot of memory
my_mean = statistics.mean(get_raisedAmt("techcruncher.csv", "a"))
print("my_mean:", my_mean)

# Now a little bit statistics
mean = statistics.mean(my_raised_amt)
median = statistics.median(my_raised_amt)
print("Mean:", mean)
print("Median:", median)

# there is also a faster method: statistics.fmean
fmean = statistics.fmean(my_raised_amt)
print("fmean:", fmean)



RE: Problem with Generator - palladium - Feb-16-2020

Thanks Buran and DeadEye for your input, much appreciated.