Iterate over data and sum

Madame32 · (This post was last modified: Oct-13-2019, 07:37 PM by Madame32.)

Hi guys.

Hope you can helped. I have been thinking about how to solve this problem, but I cant. Basically, I have a dataset of around 800 rows, with a name that contains a value. The dataset have multiple of the same values. What I am trying to solve, is how to iterate over all the rows and add together the Number_Count, each time they have the same NameOfProduct, and then in the end, give me all the unique values with overall sum. There around 750 NameOfProducts and maybe 400 unique NameOfProduct.

Here is a part of the data.

NameOfProduct	         Number_Count	Number_of_times
0	Minilæssere	              1	               1
1	Pallegafler	              1	               1
2	Rendegravere	          1	               1
3	Saltspreder	              1            	   1
4	Sneplov	                  2	               1
5	Vogne	                  2	               1
6	Gps	                      1	               1
7	Skovle	                  1	               1
8	Brakslåmaskiner	          2	               1
9	River Og Vendere	      16           	   1
10	Selvkørende Finsnittere	  7	               1

Hope it makes sense :/
Hope you can help! :)

Thank you!

ClimbAddict · (This post was last modified: Oct-13-2019, 10:12 PM by ClimbAddict.)

I would try using the dictionary

Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count

Malt · Oct-14-2019, 10:36 AM

(Oct-13-2019, 10:11 PM)ClimbAddict Wrote: I would try using the dictionary
Unique = {}

for row in data_set:
    Product_name = row[1]
    Count = row[2]

    if Product_name in Unique:
        Unique = Unique[Product_name] + Count

what if the Product_name is not in Unique? and where we are storing Product_name against its count

**scidam** · Oct-14-2019, 11:04 AM

This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

**perfringo** · Oct-14-2019, 11:37 AM

The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1

I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

which will give:

Output:Abc: 1
ABC: 3
abc: 6

As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

Madame32 · (This post was last modified: Oct-14-2019, 04:22 PM by Madame32.)

Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:
import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

But wont this just iterate over one row, and tell me the number of times a variable is unique?

I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.

Thank you for your help!

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

For Perfringo:

I tried

import csv
 
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I got an error message saying: NameError: name 'DictReader' is not defined

Thank you for the help!

**perfringo** · Oct-14-2019, 04:36 PM

My mistake. There should have been first row:

from csv import DictReader

Madame32 · Oct-14-2019, 04:54 PM

(Oct-14-2019, 04:36 PM)perfringo Wrote: My mistake. There should have been first row:
from csv import DictReader

I simply must be stupid. How come it wont understand it ?

from csv import DictReader
import csv
  
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['NameOfProduct'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["Number_Count"]) for row in data if row["NameOfProduct"] == name)}')

KeyError: 'NameOfProduct'

Btw: How can I combine two variables into one in order to obtain an overall frequency is probably how I should have startet the post ;)

Madame32 · Oct-14-2019, 05:54 PM

Problem solved! :) Thank you for all the replies !

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Alternative approach to iterate numerous linear regressions with xlsx data?	john_538	0	2,491	Apr-07-2018, 10:15 PM Last Post: john_538

Iterate over data and sum

User Panel Messages

Announcements