Iterate over data and sum

Madame32 · (This post was last modified: Oct-14-2019, 04:22 PM by Madame32.)

Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:
import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

But wont this just iterate over one row, and tell me the number of times a variable is unique?

I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.

Thank you for your help!

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.

I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I get the message: NameError: name 'DictReader' is not defined

Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

For Perfringo:

I tried

import csv
 
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')

I got an error message saying: NameError: name 'DictReader' is not defined

Thank you for the help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Alternative approach to iterate numerous linear regressions with xlsx data?	john_538	0	2,516	Apr-07-2018, 10:15 PM Last Post: john_538

Iterate over data and sum

User Panel Messages

Announcements