Python Forum
Pandas - compute means per category and time
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas - compute means per category and time
#1
Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow:

df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,4,2,1] , 'round':[1,2,3,1,2,1 ]})
Desired output is:
new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
I am trying to use .shift() function, but it doesn`t help. Thanks for any suggestions!
Reply
#2
How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values.
rama27 likes this post
Reply
#3
(Nov-11-2020, 09:07 PM)jefsummers Wrote: How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values.

Sorry, I wrote 4 instead of 1! Is it clear now?

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, (4+5)/2 , (3+4+5)/3, 1, (1+2)/2, 1 ]  })
Reply
#4
(Nov-11-2020, 06:52 PM)rama27 Wrote: Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow:

df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,4,2,1] , 'round':[1,2,3,1,2,1 ]})
Desired output is:
new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
I am trying to use .shift() function, but it doesn`t help. Thanks for any suggestions!

Hey there! I might be missing something here but can't you use the groupby method available in the DataFrame object?
Reply
#5
Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...
Reply
#6
(Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...

OK, so once again, sorry :) Look at this df:
df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ]})
In the first round I don`t compute any mean, so 'mean_per_round':[5, , , 1, , 1 ]. In the second round, I compute mean of the value from the second and first round. So 'mean_per_round':[,4.5 , , ,1.5 , ]. Similarly, in the third round I compute the average of the value from the first, second and third round. So 'mean_per_round':[, ,4 , , , ]. I work with unbalanced dataset, so I have no values of "b" in the third round and no values of "c" for the second and third round.
Putting it together, new df will look like this:

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
Is it clear now?

I tried groupby, but this gives me something different:
df.groupby(['name', 'round']).mean()
Reply
#7
I don't think there is an easy process. I would:
1. create a pd.Series that is the same length as your dataframe,
2. Do your 3 rounds, calculating the values and adjusting the values in the series. Use iloc to pull the individual values
3. append the series to the dataframe (new column)
Reply
#8
Alright. I see but you might still want to use groupby() but also with the expanding() method. This seems to work on the example DataFrame:

df['mean_per_round'] = df.groupby(['name'])['value'].expanding().mean().values
print(df)

# Out[12]: 
#   name  value  round  mean_per_round
# 0    a      5      1             5.0
# 1    a      4      2             4.5
# 2    a      3      3             4.0
# 3    b      1      1             1.0
# 4    b      2      2             1.5
# 5    c      1      1             1.0
Hope you solve your issue!

(Nov-12-2020, 07:32 PM)rama27 Wrote:
(Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...

OK, so once again, sorry :) Look at this df:
df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ]})
In the first round I don`t compute any mean, so 'mean_per_round':[5, , , 1, , 1 ]. In the second round, I compute mean of the value from the second and first round. So 'mean_per_round':[,4.5 , , ,1.5 , ]. Similarly, in the third round I compute the average of the value from the first, second and third round. So 'mean_per_round':[, ,4 , , , ]. I work with unbalanced dataset, so I have no values of "b" in the third round and no values of "c" for the second and third round.
Putting it together, new df will look like this:

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
Is it clear now?

I tried groupby, but this gives me something different:
df.groupby(['name', 'round']).mean()
rama27 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Parsing and summing time deltas (duration) onto bar + pie charts using pandas - - DRY Drone4four 2 513 Feb-10-2024, 06:04 PM
Last Post: Drone4four
  Pandas read csv file in 'date/time' chunks MorganSamage 4 1,647 Feb-13-2023, 11:24 AM
Last Post: MorganSamage
  K Means Clustering antouanet 0 660 Jan-30-2023, 01:18 PM
Last Post: antouanet
  pandas: Compute the % of the unique values in a column JaneTan 1 1,756 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  Kaggle Titanic - new category placement snakes 0 1,650 Oct-18-2021, 07:53 PM
Last Post: snakes
  How does K-Means++ work in selecting initial centroids? JaneTan 1 1,885 Jul-30-2021, 08:56 AM
Last Post: Gribouillis
  Does a pandas have a date without a time? AlekseyPython 6 4,865 Feb-10-2021, 09:24 AM
Last Post: Naheed
  Using shift to compute the percent change in a time series new_to_python 6 4,062 Mar-03-2020, 07:50 PM
Last Post: new_to_python
  create 10 yearly blocks from time series using pandas Staph 1 1,929 Jul-23-2019, 12:01 PM
Last Post: Malt
  Simple String to Time within a pandas dataframe Ecniv 1 2,480 Jun-14-2019, 03:25 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020