Problem with Generator - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Problem with Generator (/thread-24370.html) |
Problem with Generator - palladium - Feb-11-2020 Hi all I am trying to learn about generators. The code below is showing some strange behaviours (the aim is to calculate the average raised Amt for each company in round "a"): techcruncher.csv:
file_name = "techcruncher.csv" lines = (line for line in open(file_name)) list_line = (s.rstrip().split(",") for s in lines) cols = next(list_line) company_dicts = (dict(zip(cols,data)) for data in list_line) unique_companies = ( company_dict["company"] for company_dict in company_dicts if company_dict["round"] == ("a".lower()) ) number_of_companies = len(list(unique_companies)) print(number_of_companies)I get the output below, which is what I expect. However, when I change the code to this:file_name = "techcruncher.csv" lines = (line for line in open(file_name)) list_line = (s.rstrip().split(",") for s in lines) cols = next(list_line) company_dicts = (dict(zip(cols,data)) for data in list_line) funding = ( int(company_dict["raisedAmt"]) for company_dict in company_dicts if company_dict["round"] == ('A'.lower()) ) total_series_a = sum(funding) print(total_series_a) unique_companies = ( company_dict["company"] for company_dict in company_dicts if company_dict["round"] == ("a".lower()) ) number_of_companies = len(list(unique_companies)) print(number_of_companies)(new lines are 6-12 inclusive), I get the following output: Somehow the generator unique_companies do not seem to work anymore. Would appreciate if someone could explain why is this the case. Thanks in advance! RE: Problem with Generator - buran - Feb-11-2020 company_dicts is generator and you can iterate over it only once. after iterating over it for the finding the generator is exhausted. By the way you are doing a lot of redundunt stuff here... RE: Problem with Generator - palladium - Feb-13-2020 Thanks for the tip. I tried the following and it works. Hopefully there's no redundant stuff here: file_name = "techcruncher.csv" lines = (line for line in open(file_name)) list_line = (s.rstrip().split(",") for s in lines) cols = next(list_line) company_dicts = (dict(zip(cols,data)) for data in list_line) z = list(company_dicts) funding = ( int(item["raisedAmt"]) for item in z if item["round"] == 'a') x = list(funding) print(sum(x)/len(x)) RE: Problem with Generator - buran - Feb-13-2020 (Feb-13-2020, 02:37 AM)palladium Wrote: Hopefully there's no redundant stuff hereGiven that you want to learn about generators, I will keep your approach although there are better ways to do this. 1. you can combine line 2 and 3 into one generator expression. 2. it's better to use with context manager to open the file3. line 5 can be list directly, instead of making it generator expression and immediately make it list on the next line 4. same for lines 7-11 you may want to look at cav.DictReader module and you can iterate over the file handler directly, instead of creating a generator import csv with open("techcruncher.csv") as f: rdr = csv.DictReader(f) round_a_funding = [int(item["raisedAmt"]) for item in rdr if item["round"] == 'a'] print(f'Average Round A funding is {sum(round_a_funding)/len(round_a_funding):.2f}')as an alternative you may look at Pandas to process the data as dataframe. RE: Problem with Generator - DeaD_EyE - Feb-13-2020 You can use the module statistics .Buran's code extended. Instead of list comprehension you should use a for-loop. The cause is, that you can't catch Exceptions inside a list comprehension. In the case if one value is not an integer, you get nothing back. With pandas this should be easier, but I don't use it. import csv import statistics def get_raisedAmt(file, round): """ Generator which yields the column 'raisedAmt' from selected round. """ with open(file) as fd: rdr = csv.DictReader(fd) # each iteration of DictReader # yields an dictionary # the column header is parsed automatically for item in rdr: if item["round"] == round: try: value = int(item["raisedAmt"]) except ValueError: continue else: yield value # If you want to reuse the yielded values, # use tuple, list or other Type you want my_raised_amt = list(get_raisedAmt("techcruncher.csv", "a")) # If you know before, that you don't need the original values # you let statistics.mean or statistics.median consume the generator # If you have many rows, this saves a lot of memory my_mean = statistics.mean(get_raisedAmt("techcruncher.csv", "a")) print("my_mean:", my_mean) # Now a little bit statistics mean = statistics.mean(my_raised_amt) median = statistics.median(my_raised_amt) print("Mean:", mean) print("Median:", median) # there is also a faster method: statistics.fmean fmean = statistics.fmean(my_raised_amt) print("fmean:", fmean) RE: Problem with Generator - palladium - Feb-16-2020 Thanks Buran and DeadEye for your input, much appreciated. |