Python Forum
How to sum across variable columns in a dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to sum across variable columns in a dataframe (/thread-32192.html)



How to sum across variable columns in a dataframe - rennerom - Jan-27-2021

Hi there,

I end up doing lots of this type of data manipulation in excel as I haven't found a solid python solution yet.

Lets say this is my pandas dataframe, and I need to sum columns P1 through P5 into a new column called "Total". That's a fairly easy task.

data = {'ID': ['a','b','c','d','e'], 
        'Target': [2,4,1,2,5], 
        'P1': [30,50,22.2,1,24.4], 
        'P2':[40,66.34,33,5,8], 
        'P3':[59,100,41,6,30], 
        'P4':[35,130.22,12,0,21], 
        'P5':[22,90,10.5,1.1,19.9]
        }

df = pd.DataFrame(data)
df['Sum']=df.iloc[:,2:].sum(axis=1)
print(df)
Output:
ID Target P1 P2 P3 P4 P5 Total 0 a 2 30.0 40.00 59 35.00 22.0 186.00 1 b 4 50.0 66.34 100 130.22 90.0 436.56 2 c 1 22.2 33.00 41 12.00 10.5 118.70 3 d 2 1.0 5.00 6 0.00 1.1 13.10 4 e 5 24.4 8.00 30 21.00 19.9 103.30
But what if I don't always need to sum to P5, what if I want to stop at P4, or P3? What if where I stop depends on what some other variable? In this dataframe, I am using the Target column to dictate how "far out" I need to sum my P columns. So for example, if Target = 2, then "Total" would be P1 + P2. If Target = 3, then "Total" would be P1 + P2 + P3, and so on.

I would expect my "Totals" column to now look like this...
Output:
ID Target P1 P2 P3 P4 P5 Total 0 a 2 30.0 40.00 59 35.00 22.0 70.00 1 b 4 50.0 66.34 100 130.22 90.0 346.56 2 c 1 22.2 33.00 41 12.00 10.5 22.20 3 d 2 1.0 5.00 6 0.00 1.1 6.00 4 e 5 24.4 8.00 30 21.00 19.9 103.30
I just can't seem to get there.

The last of several attempts of this dynamic sum led me to try different combinations of iloc and sum. Something like this.

df['Total']=df.iloc[:,2:COLUMN_I_WANT_TO_SUM_TO].sum(axis=1)
where the COLUMN_I_WANT_TO_SUM_TO would be df['Target']

But I keep running into the same issues when I run this (or anything similar)

df['Total']=df.iloc[:,2:df['Target']].sum(axis=1)
While trouble shooting this error it seems it's related to conflicting indices (like the index bases are shifted if that makes sense?), but I don't quite have the python chops to trace it to its source.

Error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-67-dea282f63fac> in <module> ----> 1 df['Sum']=df.iloc[:,2:df['Target']].sum(axis=1) 2 print(df) /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key) 887 # AttributeError for IntervalTree get_value 888 return self.obj._get_value(*key, takeable=self._takeable) --> 889 return self._getitem_tuple(key) 890 else: 891 # we by definition only have the 0th axis /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 1452 return self._getitem_lowerdim(tup) 1453 -> 1454 return self._getitem_tuple_same_dim(tup) 1455 1456 def _get_list_axis(self, key, axis: int): /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup) 773 continue 774 --> 775 retval = getattr(retval, self.name)._getitem_axis(key, axis=i) 776 # We should never have retval.ndim < self.ndim, as that should 777 # be handled by the _getitem_lowerdim call above. /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1479 def _getitem_axis(self, key, axis: int): 1480 if isinstance(key, slice): -> 1481 return self._get_slice_axis(key, axis=axis) 1482 1483 if isinstance(key, list): /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis) 1511 1512 labels = obj._get_axis(axis) -> 1513 labels._validate_positional_slice(slice_obj) 1514 return self.obj._slice(slice_obj, axis=axis) 1515 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py in _validate_positional_slice(self, key) 3319 """ 3320 self._validate_indexer("positional", key.start, "iloc") -> 3321 self._validate_indexer("positional", key.stop, "iloc") 3322 self._validate_indexer("positional", key.step, "iloc") 3323 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py in _validate_indexer(self, form, key, kind) 5307 pass 5308 else: -> 5309 raise self._invalid_indexer(form, key) 5310 5311 def _maybe_cast_slice_bound(self, label, side: str_t, kind): TypeError: cannot do positional indexing on Index with these indexers [0 2 1 4 2 1 3 2 4 5 Name: Target, dtype: int64] of type Series



RE: How to sum across variable columns in a dataframe - nealc - Jan-28-2021

Hi Rennerom

That's a tricky one, maybe try apply:

import pandas as pd
from numpy import arange


data = {'ID': ['a','b','c','d','e'], 
        'Target': [2,4,1,2,5], 
        'P1': [30,50,22.2,1,24.4], 
        'P2':[40,66.34,33,5,8], 
        'P3':[59,100,41,6,30], 
        'P4':[35,130.22,12,0,21], 
        'P5':[22,90,10.5,1.1,19.9]
        }
 
df = pd.DataFrame(data)
df['Sum']=df.iloc[:,2:].sum(axis=1)
df["varsum"]=df.apply(lambda row: row[["P"+str(i) for i in arange(1,1+row["Target"])]].sum(), axis=1)
print(df)

  ID  Target    P1     P2   P3      P4    P5     Sum  varsum
0  a       2  30.0  40.00   59   35.00  22.0  186.00   70.00
1  b       4  50.0  66.34  100  130.22  90.0  436.56  346.56
2  c       1  22.2  33.00   41   12.00  10.5  118.70   22.20
3  d       2   1.0   5.00    6    0.00   1.1   13.10    6.00
4  e       5  24.4   8.00   30   21.00  19.9  103.30  103.30



RE: How to sum across variable columns in a dataframe - rennerom - Jan-31-2021

That was it! Thanks @nealc