Hi there,
I end up doing lots of this type of data manipulation in excel as I haven't found a solid python solution yet.
Lets say this is my pandas dataframe, and I need to sum columns P1 through P5 into a new column called "Total". That's a fairly easy task.
I would expect my "Totals" column to now look like this...
The last of several attempts of this dynamic sum led me to try different combinations of iloc and sum. Something like this.
But I keep running into the same issues when I run this (or anything similar)
I end up doing lots of this type of data manipulation in excel as I haven't found a solid python solution yet.
Lets say this is my pandas dataframe, and I need to sum columns P1 through P5 into a new column called "Total". That's a fairly easy task.
data = {'ID': ['a','b','c','d','e'], 'Target': [2,4,1,2,5], 'P1': [30,50,22.2,1,24.4], 'P2':[40,66.34,33,5,8], 'P3':[59,100,41,6,30], 'P4':[35,130.22,12,0,21], 'P5':[22,90,10.5,1.1,19.9] } df = pd.DataFrame(data) df['Sum']=df.iloc[:,2:].sum(axis=1) print(df)
Output: ID Target P1 P2 P3 P4 P5 Total
0 a 2 30.0 40.00 59 35.00 22.0 186.00
1 b 4 50.0 66.34 100 130.22 90.0 436.56
2 c 1 22.2 33.00 41 12.00 10.5 118.70
3 d 2 1.0 5.00 6 0.00 1.1 13.10
4 e 5 24.4 8.00 30 21.00 19.9 103.30
But what if I don't always need to sum to P5, what if I want to stop at P4, or P3? What if where I stop depends on what some other variable? In this dataframe, I am using the Target column to dictate how "far out" I need to sum my P columns. So for example, if Target = 2, then "Total" would be P1 + P2. If Target = 3, then "Total" would be P1 + P2 + P3, and so on. I would expect my "Totals" column to now look like this...
Output: ID Target P1 P2 P3 P4 P5 Total
0 a 2 30.0 40.00 59 35.00 22.0 70.00
1 b 4 50.0 66.34 100 130.22 90.0 346.56
2 c 1 22.2 33.00 41 12.00 10.5 22.20
3 d 2 1.0 5.00 6 0.00 1.1 6.00
4 e 5 24.4 8.00 30 21.00 19.9 103.30
I just can't seem to get there.The last of several attempts of this dynamic sum led me to try different combinations of iloc and sum. Something like this.
df['Total']=df.iloc[:,2:COLUMN_I_WANT_TO_SUM_TO].sum(axis=1)where the
COLUMN_I_WANT_TO_SUM_TO
would be df['Target']
But I keep running into the same issues when I run this (or anything similar)
df['Total']=df.iloc[:,2:df['Target']].sum(axis=1)While trouble shooting this error it seems it's related to conflicting indices (like the index bases are shifted if that makes sense?), but I don't quite have the python chops to trace it to its source.
Error:---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-67-dea282f63fac> in <module>
----> 1 df['Sum']=df.iloc[:,2:df['Target']].sum(axis=1)
2 print(df)
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
887 # AttributeError for IntervalTree get_value
888 return self.obj._get_value(*key, takeable=self._takeable)
--> 889 return self._getitem_tuple(key)
890 else:
891 # we by definition only have the 0th axis
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1452 return self._getitem_lowerdim(tup)
1453
-> 1454 return self._getitem_tuple_same_dim(tup)
1455
1456 def _get_list_axis(self, key, axis: int):
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
773 continue
774
--> 775 retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
776 # We should never have retval.ndim < self.ndim, as that should
777 # be handled by the _getitem_lowerdim call above.
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1479 def _getitem_axis(self, key, axis: int):
1480 if isinstance(key, slice):
-> 1481 return self._get_slice_axis(key, axis=axis)
1482
1483 if isinstance(key, list):
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
1511
1512 labels = obj._get_axis(axis)
-> 1513 labels._validate_positional_slice(slice_obj)
1514 return self.obj._slice(slice_obj, axis=axis)
1515
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py in _validate_positional_slice(self, key)
3319 """
3320 self._validate_indexer("positional", key.start, "iloc")
-> 3321 self._validate_indexer("positional", key.stop, "iloc")
3322 self._validate_indexer("positional", key.step, "iloc")
3323
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexes/base.py in _validate_indexer(self, form, key, kind)
5307 pass
5308 else:
-> 5309 raise self._invalid_indexer(form, key)
5310
5311 def _maybe_cast_slice_bound(self, label, side: str_t, kind):
TypeError: cannot do positional indexing on Index with these indexers [0 2
1 4
2 1
3 2
4 5
Name: Target, dtype: int64] of type Series