Python Forum
Pandas - cumulative sum of two columns
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas - cumulative sum of two columns
#1
I want to calculate points for soccer teams. I have points for the hometeam and awayteam for each game and i want to get the sum for previous games. (not include the row i'm standing on) This is a part of the dataframe (i have more columns then displayed in the example):

 

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
The desired output is:

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0
I can calculate total points using stack, but since i have around 10 columns (all not displayed in this example), the result gets really messy.

df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN
I have also tried groupby home- and awayteam, but that doesn't give me the total points:

df.columns = ['Hometeam', 'Awayteam', "Year", 'homepoints', 'awayponts']
df["Homepoints_tot"] = df.groupby(['Hometeam', "Year"])['homepoints'].transform(lambda x: x.cumsum().shift())
df["Awaypoints_tot"] = df.groupby(['Awayteam', "Year"])['awayponts'].transform(lambda x: x.cumsum().shift())

      Hometeam    Awayteam  Year  homepoints  awayponts  Homepoints_tot  Awaypoints_tot
0   Gothenburg       Malmo  2018           1          1             NaN             NaN
1        Malmo  Gothenburg  2018           1          1             NaN             NaN
2        Malmo  Gothenburg  2018           0          3             1.0             1.0
3   Gothenburg       Malmo  2018           1          1             1.0             1.0
4   Gothenburg       Malmo  2018           0          3             2.0             2.0
5   Gothenburg       Malmo  2018           1          1             2.0             5.0
6   Gothenburg       Malmo  2018           0          3             3.0             6.0
7        Malmo  Gothenburg  2018           0          3             1.0             4.0
8   Gothenburg       Malmo  2018           1          1             3.0             9.0
9        Malmo  Gothenburg  2018           0          3             1.0             7.0
10       Malmo  Gothenburg  2018           1          1             1.0            10.0
11       Malmo  Gothenburg  2018           0          3             2.0            11.0
Any ideas how to get the desired output?
Reply


Messages In This Thread
Pandas - cumulative sum of two columns - by tobbs - May-19-2019, 09:29 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  iretate over columns in df and calculate euclidean distance with one column in pandas Pit292 0 3,314 May-09-2021, 06:46 PM
Last Post: Pit292
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,464 Feb-20-2021, 08:25 PM
Last Post: perfringo
  Remove extra count columns created by pandas groupby spyf8 1 2,741 Feb-10-2021, 09:19 AM
Last Post: Naheed
  Pandas: summing columns conditional on the column labels ddd2332 0 2,146 Sep-10-2020, 05:58 PM
Last Post: ddd2332
  Difference of two columns in Pandas dataframe zinho 2 3,383 Jun-17-2020, 03:36 PM
Last Post: zinho
  Pandas dataframe columns collapsed in Spyder when printing UniKlixX 2 4,765 Nov-04-2019, 07:00 AM
Last Post: UniKlixX
  [pandas] How to re-arrange DataFrame columns SriMekala 8 4,967 Jun-22-2019, 12:55 AM
Last Post: scidam
  comparing two columns two different files in pandas nuncio 0 2,415 Jun-06-2018, 01:04 PM
Last Post: nuncio
  subtruction of columns in pandas garikhgh0 3 2,664 May-14-2018, 10:31 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020