How to sum across variable columns in a dataframe - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to sum across variable columns in a dataframe (/thread-32192.html) |
How to sum across variable columns in a dataframe - rennerom - Jan-27-2021 Hi there, I end up doing lots of this type of data manipulation in excel as I haven't found a solid python solution yet. Lets say this is my pandas dataframe, and I need to sum columns P1 through P5 into a new column called "Total". That's a fairly easy task. data = {'ID': ['a','b','c','d','e'], 'Target': [2,4,1,2,5], 'P1': [30,50,22.2,1,24.4], 'P2':[40,66.34,33,5,8], 'P3':[59,100,41,6,30], 'P4':[35,130.22,12,0,21], 'P5':[22,90,10.5,1.1,19.9] } df = pd.DataFrame(data) df['Sum']=df.iloc[:,2:].sum(axis=1) print(df) But what if I don't always need to sum to P5, what if I want to stop at P4, or P3? What if where I stop depends on what some other variable? In this dataframe, I am using the Target column to dictate how "far out" I need to sum my P columns. So for example, if Target = 2, then "Total" would be P1 + P2. If Target = 3, then "Total" would be P1 + P2 + P3, and so on. I would expect my "Totals" column to now look like this... I just can't seem to get there.The last of several attempts of this dynamic sum led me to try different combinations of iloc and sum. Something like this. df['Total']=df.iloc[:,2:COLUMN_I_WANT_TO_SUM_TO].sum(axis=1)where the COLUMN_I_WANT_TO_SUM_TO would be df['Target'] But I keep running into the same issues when I run this (or anything similar) df['Total']=df.iloc[:,2:df['Target']].sum(axis=1)While trouble shooting this error it seems it's related to conflicting indices (like the index bases are shifted if that makes sense?), but I don't quite have the python chops to trace it to its source.
RE: How to sum across variable columns in a dataframe - nealc - Jan-28-2021 Hi Rennerom That's a tricky one, maybe try apply: import pandas as pd from numpy import arange data = {'ID': ['a','b','c','d','e'], 'Target': [2,4,1,2,5], 'P1': [30,50,22.2,1,24.4], 'P2':[40,66.34,33,5,8], 'P3':[59,100,41,6,30], 'P4':[35,130.22,12,0,21], 'P5':[22,90,10.5,1.1,19.9] } df = pd.DataFrame(data) df['Sum']=df.iloc[:,2:].sum(axis=1) df["varsum"]=df.apply(lambda row: row[["P"+str(i) for i in arange(1,1+row["Target"])]].sum(), axis=1) print(df) ID Target P1 P2 P3 P4 P5 Sum varsum 0 a 2 30.0 40.00 59 35.00 22.0 186.00 70.00 1 b 4 50.0 66.34 100 130.22 90.0 436.56 346.56 2 c 1 22.2 33.00 41 12.00 10.5 118.70 22.20 3 d 2 1.0 5.00 6 0.00 1.1 13.10 6.00 4 e 5 24.4 8.00 30 21.00 19.9 103.30 103.30 RE: How to sum across variable columns in a dataframe - rennerom - Jan-31-2021 That was it! Thanks @nealc |