Pandas - compute means per category and time - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas - compute means per category and time (/thread-30889.html) |
Pandas - compute means per category and time - rama27 - Nov-11-2020 Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow: df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,4,2,1] , 'round':[1,2,3,1,2,1 ]})Desired output is: new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] , 'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ] })I am trying to use .shift() function, but it doesn`t help. Thanks for any suggestions! RE: Pandas - compute means per category and time - jefsummers - Nov-11-2020 How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values. RE: Pandas - compute means per category and time - rama27 - Nov-12-2020 (Nov-11-2020, 09:07 PM)jefsummers Wrote: How are you calculating those means? You only have 3 rounds. I kind of get the first 3 values in mean_per_round but do not see how you arrive at the last 3 values. Sorry, I wrote 4 instead of 1! Is it clear now? new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] , 'mean_per_round':[5, (4+5)/2 , (3+4+5)/3, 1, (1+2)/2, 1 ] }) RE: Pandas - compute means per category and time - PsyPy - Nov-12-2020 (Nov-11-2020, 06:52 PM)rama27 Wrote: Hi, I have a following problem. I need to compute means per category and time (in my df named as a "round"). My simplified df looks like bellow: Hey there! I might be missing something here but can't you use the groupby method available in the DataFrame object? RE: Pandas - compute means per category and time - jefsummers - Nov-12-2020 Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values. And agree, groupby or other aggregating functions might work, but not by the formulas I am seeing... RE: Pandas - compute means per category and time - rama27 - Nov-12-2020 (Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values. OK, so once again, sorry :) Look at this df: df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ]})In the first round I don`t compute any mean, so 'mean_per_round':[5, , , 1, , 1 ]. In the second round, I compute mean of the value from the second and first round. So 'mean_per_round':[,4.5 , , ,1.5 , ]. Similarly, in the third round I compute the average of the value from the first, second and third round. So 'mean_per_round':[, ,4 , , , ]. I work with unbalanced dataset, so I have no values of "b" in the third round and no values of "c" for the second and third round. Putting it together, new df will look like this: new_df = pd.DataFrame(data={ 'name':["a","a","a","b","b","c" ] , 'value':[5,4,3,1,2,1] , 'round':[1,2,3,1,2,1 ] , 'mean_per_round':[5, 4.5 , 4, 1, 1.5, 1 ] })Is it clear now? I tried groupby, but this gives me something different: df.groupby(['name', 'round']).mean() RE: Pandas - compute means per category and time - jefsummers - Nov-12-2020 I don't think there is an easy process. I would: 1. create a pd.Series that is the same length as your dataframe, 2. Do your 3 rounds, calculating the values and adjusting the values in the series. Use iloc to pull the individual values 3. append the series to the dataframe (new column) RE: Pandas - compute means per category and time - PsyPy - Nov-13-2020 Alright. I see but you might still want to use groupby() but also with the expanding() method. This seems to work on the example DataFrame: df['mean_per_round'] = df.groupby(['name'])['value'].expanding().mean().values print(df) # Out[12]: # name value round mean_per_round # 0 a 5 1 5.0 # 1 a 4 2 4.5 # 2 a 3 3 4.0 # 3 b 1 1 1.0 # 4 b 2 2 1.5 # 5 c 1 1 1.0Hope you solve your issue! (Nov-12-2020, 07:32 PM)rama27 Wrote:(Nov-12-2020, 04:49 PM)jefsummers Wrote: Not clear. Why is the 4th item, which is round 1, (5+1)/2? You have 2 round 1 entries by this time. And the 5th would be (4+2)/2? Still not getting how you arrive at your values. |