Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
I might be asking stupid, when I try to output the code
But wont this just iterate over one row, and tell me the number of times a variable is unique?
I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.
Thank you for your help!
I might be asking stupid, when I try to output the code
I might be asking stupid, when I try to output the code
Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
For Perfringo:
I tried
Thank you for the help!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
I would just simple brute-force:
Output:Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
with open('count_data.txt', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')which will give:
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
Output:Abc: 1 ABC: 3 abc: 6
I might be asking stupid, when I try to output the code
import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined
(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:
import pandas as pd df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]}) df.groupby('x').sum()
But wont this just iterate over one row, and tell me the number of times a variable is unique?
I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.
Thank you for your help!
(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
I would just simple brute-force:
Output:Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
with open('count_data.txt', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')which will give:
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
Output:Abc: 1 ABC: 3 abc: 6
I might be asking stupid, when I try to output the code
import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined
(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
I would just simple brute-force:
Output:Name,count abc,1 ABC,2 Abc,1 abc,5 ABC,1
with open('count_data.txt', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')which will give:
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
Output:Abc: 1 ABC: 3 abc: 6
I might be asking stupid, when I try to output the code
import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I get the message: NameError: name 'DictReader' is not defined
Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
For Perfringo:
I tried
import csv with open('data.csv', 'r') as f: data = list(DictReader(f.readlines())) unique = {row['Name'] for row in data} for name in unique: print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')I got an error message saying: NameError: name 'DictReader' is not defined
Thank you for the help!