Python Forum
Iterate over data and sum - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Iterate over data and sum (/thread-21772.html)



Iterate over data and sum - Madame32 - Oct-13-2019

Hi guys.

Hope you can helped. I have been thinking about how to solve this problem, but I cant. Basically, I have a dataset of around 800 rows, with a name that contains a value. The dataset have multiple of the same values. What I am trying to solve, is how to iterate over all the rows and add together the Number_Count, each time they have the same NameOfProduct, and then in the end, give me all the unique values with overall sum. There around 750 NameOfProducts and maybe 400 unique NameOfProduct.

Here is a part of the data.

NameOfProduct	         Number_Count	Number_of_times
0	Minilæssere	              1	               1
1	Pallegafler	              1	               1
2	Rendegravere	          1	               1
3	Saltspreder	              1            	   1
4	Sneplov	                  2	               1
5	Vogne	                  2	               1
6	Gps	                      1	               1
7	Skovle	                  1	               1
8	Brakslåmaskiner	          2	               1
9	River Og Vendere	      16           	   1
10	Selvkørende Finsnittere	  7	               1
Hope it makes sense :/
Hope you can help! :)

Thank you!


RE: Iterate over data and sum - ClimbAddict - Oct-13-2019

I would try using the dictionary

Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count



RE: Iterate over data and sum - Malt - Oct-14-2019

(Oct-13-2019, 10:11 PM)ClimbAddict Wrote: I would try using the dictionary

Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count

what if the Product_name is not in Unique? and where we are storing Product_name against its count


RE: Iterate over data and sum - scidam - Oct-14-2019

This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()



RE: Iterate over data and sum - perfringo - Oct-14-2019

The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


RE: Iterate over data and sum - Madame32 - Oct-14-2019

Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

But wont this just iterate over one row, and tell me the number of times a variable is unique?

I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.

Thank you for your help!

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

For Perfringo:

I tried

import csv
 
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I got an error message saying: NameError: name 'DictReader' is not defined

Thank you for the help!


RE: Iterate over data and sum - perfringo - Oct-14-2019

My mistake. There should have been first row:

from csv import DictReader



RE: Iterate over data and sum - Madame32 - Oct-14-2019

(Oct-14-2019, 04:36 PM)perfringo Wrote: My mistake. There should have been first row:

from csv import DictReader

I simply must be stupid. How come it wont understand it ?

from csv import DictReader
import csv
  
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['NameOfProduct'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["Number_Count"]) for row in data if row["NameOfProduct"] == name)}')
KeyError: 'NameOfProduct'

Btw: How can I combine two variables into one in order to obtain an overall frequency is probably how I should have startet the post ;)


RE: Iterate over data and sum - Madame32 - Oct-14-2019

Problem solved! :) Thank you for all the replies !