Python Forum
Dataframe mean calculation problem: do we have to loop? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Dataframe mean calculation problem: do we have to loop? (/thread-29308.html)



Dataframe mean calculation problem: do we have to loop? - sparkt - Aug-27-2020

Suppose we have a very simple dataframe.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 5, 6, 7], 'B': [20, 30, 50, 90, 80]})
print(df)
A B
0 1 20
1 2 30
2 5 50
3 6 90
4 7 80

The question is simple: How do we create a third row 'C' such that the following is true?

df.C[0] = mean of all the 10 numbers
df.C[1] = mean of 2, 5, 6, 7, 30, 50, 90, 80
df.C[2] = mean of 5, 6, 7, 50, 90, 80
df.C[3] = mean of 6, 7, 90, 80
df.C[4] = mean of 7, 80

I've read dozens of relevant tutorials online but all of them only teach how to get a single mean for a single row.
Any help would be much appreciated.


RE: Dataframe mean calculation problem - sparkt - Aug-28-2020

I got it, though I expected something much simpler.

df['C'] = 0.0
for i in df.index:
    df['C'][i] = df[['A', 'B']][i:].mean().mean()
print(df)
But do we have to use this for loop to go through all values? One of the best things of dataframe is that it can deal with a complex frame once only when applying function, without the need of going through looping which is not good for optimal performance.

I hope there's a more elegant solution!