Python Forum
manipulating a dataframe - pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
manipulating a dataframe - pandas
#1
Hello,

Whilst this is an assignment, I thought I would post in this forum page as it is pandas related...

I have a rubbish dataframe to use that lists sales amounts over a 6 month period. There are just two columns - a 'Country' column listing the names of countries that produced sales and a second column 'Sales Amount' that simply has an integer value for the number of sales for that country.

It is quite rubbish in that the countries appear multiple times (some more than others) and I have the task of creating a new dataframe with the unique country names as the index, but the second column must be a sum of the total number of sales for each country. So if England appears 3 times in the list with a 1, a 2 and a 3 being the integers in the three rows. I would need to return England in the first column with 6 in the second.

I have no issue getting the unique values, but my brain is stumped on producing the second column based on a total sum of each unique countries sales amounts.

So if I had the following basic dataframe...
df = pd.DataFrame{'England':1,'England':2,'England':3,'Wales':2,'Wales':7,'Wales':4}
Giving the output of...
England 1
England 2
England 3
Wales 2
Wales 7
Wales 4

I would need to return a new dataframe (not a pivot table or other structure - the assignment says it must be a dataframe) looking like this...

Country Sales Amount
England 6
Wales 13

Any pointers - not answers - they would be greatly appreciated.

I have tried to use a groupby() to populate the second column, but I just seem to get either 123 as the sales for England rather than 6, or I get 3 which is the number of times England appears in the Country column.

Apologies for the lack of code given, I am on my mobile and it is not easy to enter code with my very old and small phone screen.

Kind Regards
Reply
#2
Difficult to give you hints/pointers without giving the answer.
So, with .groupby() you were on the right track but didn“t take next step.

import pandas as pd

data = [['England', 1], ['England', 2], ['England', 3], ['Wales', 2], ['Wales', 7], ['Wales', 4]]
df = pd.DataFrame(data, columns=["country", "sales"])

new_df = df.groupby("country").sum()
new_df.reset_index(level=0, inplace=True)

print(new_df)
print(type(new_df))
Reply
#3
ThomasL

Thanks. Good to know I was nearly there initially.

I will perservere and look at the example you have given to make it work.

Thanks for your help
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question [Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet Calab 1 649 Mar-01-2025, 04:51 AM
Last Post: Calab
  Find duplicates in a pandas dataframe list column on other rows Calab 2 2,127 Sep-18-2024, 07:38 PM
Last Post: Calab
  Find strings by index from a list of indexes in a different Pandas dataframe column Calab 3 1,610 Aug-26-2024, 04:52 PM
Last Post: Calab
  Add NER output to pandas dataframe dg3000 0 1,136 Apr-22-2024, 08:14 PM
Last Post: dg3000
  HTML Decoder pandas dataframe column mbrown009 3 2,622 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Use pandas to obtain cartesian product between a dataframe of int and equations? haihal 0 1,987 Jan-06-2023, 10:53 PM
Last Post: haihal
  Pandas Dataframe Filtering based on rows mvdlm 0 2,063 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  Pandas dataframe: calculate metrics by year mcva 1 3,390 Mar-02-2022, 08:22 AM
Last Post: mcva
  Pandas dataframe comparing anto5 0 1,895 Jan-30-2022, 10:21 AM
Last Post: anto5
  PANDAS: DataFrame | Replace and others questions moduki1 2 2,598 Jan-10-2022, 07:19 PM
Last Post: moduki1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020