Iterate over data and sum - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Iterate over data and sum (/thread-21772.html) |
Iterate over data and sum - Madame32 - Oct-13-2019 Hi guys. Hope you can helped. I have been thinking about how to solve this problem, but I cant. Basically, I have a dataset of around 800 rows, with a name that contains a value. The dataset have multiple of the same values. What I am trying to solve, is how to iterate over all the rows and add together the Number_Count, each time they have the same NameOfProduct, and then in the end, give me all the unique values with overall sum. There around 750 NameOfProducts and maybe 400 unique NameOfProduct. Here is a part of the data. NameOfProduct Number_Count Number_of_times 0 Minilæssere 1 1 1 Pallegafler 1 1 2 Rendegravere 1 1 3 Saltspreder 1 1 4 Sneplov 2 1 5 Vogne 2 1 6 Gps 1 1 7 Skovle 1 1 8 Brakslåmaskiner 2 1 9 River Og Vendere 16 1 10 Selvkørende Finsnittere 7 1Hope it makes sense :/ Hope you can help! :) Thank you! RE: Iterate over data and sum - ClimbAddict - Oct-13-2019 I would try using the dictionary Unique = {} for row in data_set: Product_name = row[1] Count = row[2] if Product_name in Unique: Unique = Unique[Product_name] + Count RE: Iterate over data and sum - Malt - Oct-14-2019 (Oct-13-2019, 10:11 PM)ClimbAddict Wrote: I would try using the dictionary what if the Product_name is not in Unique? and where we are storing Product_name against its count RE: Iterate over data and sum - scidam - Oct-14-2019 This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs: import pandas as pd df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]}) df.groupby('x').sum() RE: Iterate over data and sum - perfringo - Oct-14-2019 The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content: I would just simple brute-force:with open('count_data.txt', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')which will give: As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
RE: Iterate over data and sum - Madame32 - Oct-14-2019 Thank you for all the answers guys! Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;) That is why I need to iterate through the data and output new names with a new number. (Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content: I might be asking stupid, when I try to output the code import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined (Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs: But wont this just iterate over one row, and tell me the number of times a variable is unique? I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum. Thank you for your help! (Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys! I might be asking stupid, when I try to output the code import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined (Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys! I might be asking stupid, when I try to output the code import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up! Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;) That is why I need to iterate through the data and output new names with a new number. For Perfringo: I tried import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I got an error message saying: NameError: name 'DictReader' is not defined Thank you for the help! RE: Iterate over data and sum - perfringo - Oct-14-2019 My mistake. There should have been first row: from csv import DictReader RE: Iterate over data and sum - Madame32 - Oct-14-2019 (Oct-14-2019, 04:36 PM)perfringo Wrote: My mistake. There should have been first row: I simply must be stupid. How come it wont understand it ? from csv import DictReader import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['NameOfProduct'] for row in data} for name in unique: print(f'{name}: {sum(int(row["Number_Count"]) for row in data if row["NameOfProduct"] == name)}')KeyError: 'NameOfProduct' Btw: How can I combine two variables into one in order to obtain an overall frequency is probably how I should have startet the post ;) RE: Iterate over data and sum - Madame32 - Oct-14-2019 Problem solved! :) Thank you for all the replies ! |