Python Forum
Iterate over data and sum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterate over data and sum
#1
Music 
Hi guys.

Hope you can helped. I have been thinking about how to solve this problem, but I cant. Basically, I have a dataset of around 800 rows, with a name that contains a value. The dataset have multiple of the same values. What I am trying to solve, is how to iterate over all the rows and add together the Number_Count, each time they have the same NameOfProduct, and then in the end, give me all the unique values with overall sum. There around 750 NameOfProducts and maybe 400 unique NameOfProduct.

Here is a part of the data.

NameOfProduct	         Number_Count	Number_of_times
0	Minilæssere	              1	               1
1	Pallegafler	              1	               1
2	Rendegravere	          1	               1
3	Saltspreder	              1            	   1
4	Sneplov	                  2	               1
5	Vogne	                  2	               1
6	Gps	                      1	               1
7	Skovle	                  1	               1
8	Brakslåmaskiner	          2	               1
9	River Og Vendere	      16           	   1
10	Selvkørende Finsnittere	  7	               1
Hope it makes sense :/
Hope you can help! :)

Thank you!
Reply
#2
I would try using the dictionary

Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count
Reply
#3
(Oct-13-2019, 10:11 PM)ClimbAddict Wrote: I would try using the dictionary

Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count

what if the Product_name is not in Unique? and where we are storing Product_name against its count
Reply
#4
This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()
Reply
#5
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

But wont this just iterate over one row, and tell me the number of times a variable is unique?

I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.

Thank you for your help!

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

For Perfringo:

I tried

import csv
 
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I got an error message saying: NameError: name 'DictReader' is not defined

Thank you for the help!
Reply
#7
My mistake. There should have been first row:

from csv import DictReader
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#8
(Oct-14-2019, 04:36 PM)perfringo Wrote: My mistake. There should have been first row:

from csv import DictReader

I simply must be stupid. How come it wont understand it ?

from csv import DictReader
import csv
  
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['NameOfProduct'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["Number_Count"]) for row in data if row["NameOfProduct"] == name)}')
KeyError: 'NameOfProduct'

Btw: How can I combine two variables into one in order to obtain an overall frequency is probably how I should have startet the post ;)
Reply
#9
Problem solved! :) Thank you for all the replies !
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Alternative approach to iterate numerous linear regressions with xlsx data? john_538 0 2,491 Apr-07-2018, 10:15 PM
Last Post: john_538

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020