Pandas - compute means per category and time

Pandas - compute means per category and time - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas - compute means per category and time (/thread-30889.html)

Pandas - compute means per category and time - rama27 - Nov-11-2020

Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow:

df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,4,2,1] , 'round':[1,2,3,1,2,1 ]})

Desired output is:

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })

I am trying to use .shift() function, but it doesn`t help. Thanks for any suggestions!

RE: Pandas - compute means per category and time - jefsummers - Nov-11-2020

How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values.

RE: Pandas - compute means per category and time - rama27 - Nov-12-2020

(Nov-11-2020, 09:07 PM)jefsummers Wrote: How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values.

Sorry, I wrote 4 instead of 1! Is it clear now?

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, (4+5)/2 , (3+4+5)/3, 1, (1+2)/2, 1 ]  })

RE: Pandas - compute means per category and time - PsyPy - Nov-12-2020

(Nov-11-2020, 06:52 PM)rama27 Wrote: Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow:
df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,4,2,1] , 'round':[1,2,3,1,2,1 ]})
Desired output is:
new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
I am trying to use .shift() function, but it doesn`t help. Thanks for any suggestions!

Hey there! I might be missing something here but can't you use the groupby method available in the DataFrame object?

RE: Pandas - compute means per category and time - jefsummers - Nov-12-2020

Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...

RE: Pandas - compute means per category and time - rama27 - Nov-12-2020

(Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...

OK, so once again, sorry :) Look at this df:

df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ]})

In the first round I don`t compute any mean, so 'mean_per_round':[5, , , 1, , 1 ]. In the second round, I compute mean of the value from the second and first round. So 'mean_per_round':[,4.5 , , ,1.5 , ]. Similarly, in the third round I compute the average of the value from the first, second and third round. So 'mean_per_round':[, ,4 , , , ]. I work with unbalanced dataset, so I have no values of "b" in the third round and no values of "c" for the second and third round.
Putting it together, new df will look like this:

new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })

Is it clear now?

I tried groupby, but this gives me something different:

df.groupby(['name', 'round']).mean()

RE: Pandas - compute means per category and time - jefsummers - Nov-12-2020

I don't think there is an easy process. I would:
1. create a pd.Series that is the same length as your dataframe,
2. Do your 3 rounds, calculating the values and adjusting the values in the series. Use iloc to pull the individual values
3. append the series to the dataframe (new column)

RE: Pandas - compute means per category and time - PsyPy - Nov-13-2020

Alright. I see but you might still want to use groupby() but also with the expanding() method. This seems to work on the example DataFrame:

df['mean_per_round'] = df.groupby(['name'])['value'].expanding().mean().values
print(df)

# Out[12]: 
#   name  value  round  mean_per_round
# 0    a      5      1             5.0
# 1    a      4      2             4.5
# 2    a      3      3             4.0
# 3    b      1      1             1.0
# 4    b      2      2             1.5
# 5    c      1      1             1.0

Hope you solve your issue!

(Nov-12-2020, 07:32 PM)rama27 Wrote:
(Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values.
And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing...

OK, so once again, sorry :) Look at this df:
df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ]})
In the first round I don`t compute any mean, so 'mean_per_round':[5, , , 1, , 1 ]. In the second round, I compute mean of the value from the second and first round. So 'mean_per_round':[,4.5 , , ,1.5 , ]. Similarly, in the third round I compute the average of the value from the first, second and third round. So 'mean_per_round':[, ,4 , , , ]. I work with unbalanced dataset, so I have no values of "b" in the third round and no values of "c" for the second and third round.
Putting it together, new df will look like this:
new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] ,  'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ]  })
Is it clear now?

I tried groupby, but this gives me something different:
df.groupby(['name', 'round']).mean()