Python Forum
Iterate over data and sum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterate over data and sum
#6
Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined

(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:

import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()

But wont this just iterate over one row, and tell me the number of times a variable is unique?

I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.

Thank you for your help!

(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:

Output:
Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
I would just simple brute-force:

with open('count_data.txt', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:

Output:
Abc: 1 ABC: 3 abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.


I might be asking stupid, when I try to output the code

import csv

with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message: NameError: name 'DictReader' is not defined


Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!

Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)

That is why I need to iterate through the data and output new names with a new number.

For Perfringo:

I tried

import csv
 
with open('data.csv', 'r') as f:
    data = list(DictReader(f.readlines()))
    unique = {row['Name'] for row in data}
    for name in unique:
        print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I got an error message saying: NameError: name 'DictReader' is not defined

Thank you for the help!
Reply


Messages In This Thread
Iterate over data and sum - by Madame32 - Oct-13-2019, 07:37 PM
RE: Iterate over data and sum - by ClimbAddict - Oct-13-2019, 10:11 PM
RE: Iterate over data and sum - by Malt - Oct-14-2019, 10:36 AM
RE: Iterate over data and sum - by scidam - Oct-14-2019, 11:04 AM
RE: Iterate over data and sum - by perfringo - Oct-14-2019, 11:37 AM
RE: Iterate over data and sum - by Madame32 - Oct-14-2019, 03:58 PM
RE: Iterate over data and sum - by perfringo - Oct-14-2019, 04:36 PM
RE: Iterate over data and sum - by Madame32 - Oct-14-2019, 04:54 PM
RE: Iterate over data and sum - by Madame32 - Oct-14-2019, 05:54 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Alternative approach to iterate numerous linear regressions with xlsx data? john_538 0 2,516 Apr-07-2018, 10:15 PM
Last Post: john_538

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020