Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
(Oct-14-2019, 11:37 AM)perfringo Wrote: The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:
Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:
Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
I might be asking stupid, when I try to output the code
import csv
with open('data.csv', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message:
NameError: name 'DictReader' is not defined
(Oct-14-2019, 11:04 AM)scidam Wrote: This is where we need to use data frame's groupby method. Look at the following minimal example and adopt it for your needs:
import pandas as pd
df = pd.DataFrame({'x': ['one', 'one', 'two', 'three', 'three', 'three'], 'y': [1,2,3,4,5,6]})
df.groupby('x').sum()
But wont this just iterate over one row, and tell me the number of times a variable is unique?
I am trying to iterate over both "NameOfProduct" and "Number_Count", and each time it meets a variable again in "NameOfProduct" it sum "Number_count", and in the end create a new column with the total sum.
Thank you for your help!
(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:
Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:
Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
I might be asking stupid, when I try to output the code
import csv
with open('data.csv', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message:
NameError: name 'DictReader' is not defined
(Oct-14-2019, 03:58 PM)Madame32 Wrote: Thank you for all the answers guys!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
[quote='perfringo' pid='94271' dateline='1571053023']
The existing datastructure (dataset?) and the objective is somewhat unclear for me, but if I would have had file named 'count_data.txt' with following content:
Output:
Name,count
abc,1
ABC,2
Abc,1
abc,5
ABC,1
I would just simple brute-force:
with open('count_data.txt', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
which will give:
Output:
Abc: 1
ABC: 3
abc: 6
As 'data' is list of dictionaries one can use list comprehension for whatever filtering/subtotaling needed.
I might be asking stupid, when I try to output the code
import csv
with open('data.csv', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I get the message:
NameError: name 'DictReader' is not defined
Okay, due to me being new on this forum! Just ignore the post I just posted. It completely messed up!
Just to answer the goal of the task. The dataset consists of data from 15 different webpages, who sell the same products. Now I have the categories from each webpage and the number of products in each category. The task is to create a final frame, which tells me the overall sum of products in each category from all the websites. Hope it makes sense ;)
That is why I need to iterate through the data and output new names with a new number.
For Perfringo:
I tried
import csv
with open('data.csv', 'r') as f:
data = list(DictReader(f.readlines()))
unique = {row['Name'] for row in data}
for name in unique:
print(f'{name}: {sum(int(row["count"]) for row in data if row["Name"] == name)}')
I got an error message saying: NameError: name 'DictReader' is not defined
Thank you for the help!