Python Forum
Pandas - cumulative sum of two columns
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas - cumulative sum of two columns
#1
I want to calculate points for soccer teams. I have points for the hometeam and awayteam for each game and i want to get the sum for previous games. (not include the row i'm standing on) This is a part of the dataframe (i have more columns then displayed in the example):

 

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])
The desired output is:

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0
I can calculate total points using stack, but since i have around 10 columns (all not displayed in this example), the result gets really messy.

df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN
I have also tried groupby home- and awayteam, but that doesn't give me the total points:

df.columns = ['Hometeam', 'Awayteam', "Year", 'homepoints', 'awayponts']
df["Homepoints_tot"] = df.groupby(['Hometeam', "Year"])['homepoints'].transform(lambda x: x.cumsum().shift())
df["Awaypoints_tot"] = df.groupby(['Awayteam', "Year"])['awayponts'].transform(lambda x: x.cumsum().shift())

      Hometeam    Awayteam  Year  homepoints  awayponts  Homepoints_tot  Awaypoints_tot
0   Gothenburg       Malmo  2018           1          1             NaN             NaN
1        Malmo  Gothenburg  2018           1          1             NaN             NaN
2        Malmo  Gothenburg  2018           0          3             1.0             1.0
3   Gothenburg       Malmo  2018           1          1             1.0             1.0
4   Gothenburg       Malmo  2018           0          3             2.0             2.0
5   Gothenburg       Malmo  2018           1          1             2.0             5.0
6   Gothenburg       Malmo  2018           0          3             3.0             6.0
7        Malmo  Gothenburg  2018           0          3             1.0             4.0
8   Gothenburg       Malmo  2018           1          1             3.0             9.0
9        Malmo  Gothenburg  2018           0          3             1.0             7.0
10       Malmo  Gothenburg  2018           1          1             1.0            10.0
11       Malmo  Gothenburg  2018           0          3             2.0            11.0
Any ideas how to get the desired output?
Reply
#2
I think you might have your desired output cumulative sums mixed up(or i don't understand), this output is not the same but i think it could be what you are after.

import pandas as pd

df = pd.DataFrame([
    ["Gothenburg", "Malmo", 2018, 1, 1],
    ["Malmo", "Gothenburg",  2018, 1, 1],
    ["Malmo", "Gothenburg", 2018, 0, 3],
    ["Gothenburg", "Malmo", 2018, 1, 1],
    ["Gothenburg", "Malmo", 2018, 0, 3],
    ["Gothenburg", "Malmo", 2018, 1, 1],
    ["Gothenburg", "Malmo", 2018, 0, 3],
    ["Malmo", "Gothenburg", 2018, 0, 3],
    ["Gothenburg", "Malmo", 2018, 1, 1],
    ["Malmo", "Gothenburg", 2018, 0, 3],
    ["Malmo", "Gothenburg", 2018, 1, 1],
    ["Malmo", "Gothenburg", 2018, 0, 3],
])

df.columns = ['H_team', 'A_team', 'Year', 'H_points', 'A_points']
df['H_cumsum'] = df['H_points'].cumsum().shift(1)
df['A_cumsum'] = df['A_points'].cumsum().shift(1)
print(df)
Output:
H_team A_team Year H_points A_points H_cumsum A_cumsum 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 1.0 1.0 2 Malmo Gothenburg 2018 0 3 2.0 2.0 3 Gothenburg Malmo 2018 1 1 2.0 5.0 4 Gothenburg Malmo 2018 0 3 3.0 6.0 5 Gothenburg Malmo 2018 1 1 3.0 9.0 6 Gothenburg Malmo 2018 0 3 4.0 10.0 7 Malmo Gothenburg 2018 0 3 4.0 13.0 8 Gothenburg Malmo 2018 1 1 4.0 16.0 9 Malmo Gothenburg 2018 0 3 5.0 17.0 10 Malmo Gothenburg 2018 1 1 5.0 20.0 11 Malmo Gothenburg 2018 0 3 6.0 21.0
Reply
#3
No that's not correct, the desired output is what i described at top. It's a score table for soccer teams. The team get 1 point for draw and 3 for victory. Therefor the cumlative score for Gothenburg at row 11 should be: 1+1+3+1+0+1+0+3+1+3+1 = 15 (shifted 1 row, meaning don't include the 3 points at row 11)
Reply
#4
Ok, i only know little about pandas and nothing about soccer Huh . good luck finding a solution Smile
Reply
#5
Pandas is not my cup of tea but isn’t there pandas.cumsum?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#6
Yea, i use cumsum but i can only get either the home points or the away points, like this:

df['H_cumsum'] = df['H_points'].cumsum().shift(1)
df['A_cumsum'] = df['A_points'].cumsum().shift(1)
I just don't know how to add the home points with the away points.
Reply
#7
Okey a different approach for the same problem. I have the same results but instead of points i have a value for either home win (H), draw (D) or away win (A). A win is 3 points for the winning team and draw is 1 point to both teams. I need the cumulative points for both the home and the away team.

This is the dataframe:

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018,"A"],
["Malmo","Gothenburg",  2018, "D"],
["Malmo", "Gothenburg", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Gothenburg", "Malmo" ,2018, "D"],
["Gothenburg", "Malmo", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Malmo", "Gothenburg", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Malmo", "Gothenburg", 2018, "H"],
[ "Malmo","Gothenburg", 2018, "D"],
[ "Malmo", "Gothenburg",2018, "H"],

])
And this is desired output:
      Hometeam    Awayteam  Year Tecken H_points A_points
0   Gothenburg       Malmo  2018      D      Nan      Nan
1        Malmo  Gothenburg  2018      D        1        1
2        Malmo  Gothenburg  2018      A        2        2
3   Gothenburg       Malmo  2018      D        5        2
4   Gothenburg       Malmo  2018      A        6        3
5   Gothenburg       Malmo  2018      D        6        6
6   Gothenburg       Malmo  2018      A        7        7
7        Malmo  Gothenburg  2018      A       10        7
8   Gothenburg       Malmo  2018      D       10       10
9        Malmo  Gothenburg  2018      A       11       11
10       Malmo  Gothenburg  2018      D       11       14
11       Malmo  Gothenburg  2018      A       12       15
Im losing my hair over this, hope someone can give me a hand :)
Reply
#8
I know it's ugly and I'm not experienced with pandas but it works:p

import pandas as pd

def get_team_names(df):
    return df[0][0], df[1][0]


df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018,"A"],
["Malmo","Gothenburg",  2018, "D"],
["Malmo", "Gothenburg", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Gothenburg", "Malmo" ,2018, "D"],
["Gothenburg", "Malmo", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Malmo", "Gothenburg", 2018, "A"],
["Gothenburg", "Malmo", 2018, "H"],
["Malmo", "Gothenburg", 2018, "H"],
[ "Malmo","Gothenburg", 2018, "D"],
[ "Malmo", "Gothenburg",2018, "H"],
 
])

team_1, team_2 = get_team_names(df)

df.columns = ['H_team', 'A_team', 'Year', 'Result']

draws = df['Result']=='D'

hh = (df['Result']=='H').where(df['H_team'] == team_1) * 3
ha = (df['Result']=='A').where(df['A_team'] == team_1) * 3

ah = (df['Result']=='H').where(df['H_team'] == team_2) * 3
aa = (df['Result']=='A').where(df['A_team'] == team_2) * 3

team_1_points = hh.add(ha, fill_value=0).add(draws, fill_value=0).cumsum().shift(1)
team_2_points = ah.add(aa, fill_value=0).add(draws, fill_value=0).cumsum().shift(1)

df['H_points'] = team_1_points.where(df['H_team'] == team_1, team_2_points)
df['A_points'] = team_1_points.where(df['A_team'] == team_1, team_2_points)

print(df)
Output:
H_team A_team Year Result H_points A_points 0 Gothenburg Malmo 2018 A NaN NaN 1 Malmo Gothenburg 2018 D 3.0 0.0 2 Malmo Gothenburg 2018 A 4.0 1.0 3 Gothenburg Malmo 2018 H 4.0 4.0 4 Gothenburg Malmo 2018 D 7.0 4.0 5 Gothenburg Malmo 2018 A 8.0 5.0 6 Gothenburg Malmo 2018 H 8.0 8.0 7 Malmo Gothenburg 2018 A 8.0 11.0 8 Gothenburg Malmo 2018 H 14.0 8.0 9 Malmo Gothenburg 2018 H 8.0 17.0 10 Malmo Gothenburg 2018 D 11.0 17.0 11 Malmo Gothenburg 2018 H 12.0 18.0
Reply
#9
Yes, that works! Thank you so much, i've been struggling with this one.

I need to make a change though, the points need to be reset every year since i'm reading data from 4-5 years back. Is there a way to either group by year or maybe use reset so that the points starts at zero for a new year+ 2016, 2017, 2018 and so on...
Reply
#10
Glad I could help, btw I think it would be important to make sure that the 'Year' value actually groups games from particular season, not strictly a year.

import pandas as pd

def get_team_names(df):
    return df[0][0], df[1][0]

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2010,"A"],
["Malmo","Gothenburg",  2010, "D"],
["Malmo", "Gothenburg", 2010, "A"],
["Gothenburg", "Malmo", 2010, "H"],
["Gothenburg", "Malmo" ,2011, "D"],
["Gothenburg", "Malmo", 2011, "A"],
["Gothenburg", "Malmo", 2011, "H"],
["Malmo", "Gothenburg", 2011, "A"],
["Gothenburg", "Malmo", 2012, "H"],
["Malmo", "Gothenburg", 2012, "H"],
[ "Malmo","Gothenburg", 2012, "D"],
[ "Malmo", "Gothenburg",2012, "H"],
["Malmo", "Gothenburg", 2013, "H"],
[ "Malmo","Gothenburg", 2013, "D"],
[ "Malmo", "Gothenburg",2013, "H"],
 
])

team_1, team_2 = get_team_names(df)

df.columns = ['H_team', 'A_team', 'Year', 'Result']

df['H_points'] = 0
df['A_points'] = 0

for year in set(df['Year']):

    draws = (df['Result']=='D').where(df['Year'] == year)

    hh = (df['Result']=='H').where((df['H_team'] == team_1) & (df['Year'] == year)) * 3
    ha = (df['Result']=='A').where((df['A_team'] == team_1) & (df['Year'] == year)) * 3

    ah = (df['Result']=='H').where((df['H_team'] == team_2) & (df['Year'] == year)) * 3
    aa = (df['Result']=='A').where((df['A_team'] == team_2) & (df['Year'] == year)) * 3

    team_1_points = hh.add(ha, fill_value=0).add(draws, fill_value=0).cumsum().shift(1)
    team_2_points = ah.add(aa, fill_value=0).add(draws, fill_value=0).cumsum().shift(1)

    df['H_points'] = df['H_points'].add( team_1_points.where(df['H_team'] == team_1, team_2_points), fill_value=0)
    df['A_points'] = df['A_points'].add( team_1_points.where(df['A_team'] == team_1, team_2_points), fill_value=0)

    # set first game of season to 0 points - https://stackoverflow.com/a/37911231/4620679
    # otherwise it would be equal to the sum of points from previous season (due to ".shift(1)" method)
    df.loc[df.index[df['Year'] == year][:1], 'H_points'] = 0 
    df.loc[df.index[df['Year'] == year][:1], 'A_points'] = 0

print(df)
Output:
H_team A_team Year Result H_points A_points 0 Gothenburg Malmo 2010 A 0.0 0.0 1 Malmo Gothenburg 2010 D 3.0 0.0 2 Malmo Gothenburg 2010 A 4.0 1.0 3 Gothenburg Malmo 2010 H 4.0 4.0 4 Gothenburg Malmo 2011 D 0.0 0.0 5 Gothenburg Malmo 2011 A 1.0 1.0 6 Gothenburg Malmo 2011 H 1.0 4.0 7 Malmo Gothenburg 2011 A 4.0 4.0 8 Gothenburg Malmo 2012 H 0.0 0.0 9 Malmo Gothenburg 2012 H 0.0 3.0 10 Malmo Gothenburg 2012 D 3.0 3.0 11 Malmo Gothenburg 2012 H 4.0 4.0 12 Malmo Gothenburg 2013 H 0.0 0.0 13 Malmo Gothenburg 2013 D 3.0 0.0 14 Malmo Gothenburg 2013 H 4.0 1.0
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  iretate over columns in df and calculate euclidean distance with one column in pandas Pit292 0 3,268 May-09-2021, 06:46 PM
Last Post: Pit292
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,420 Feb-20-2021, 08:25 PM
Last Post: perfringo
  Remove extra count columns created by pandas groupby spyf8 1 2,686 Feb-10-2021, 09:19 AM
Last Post: Naheed
  Pandas: summing columns conditional on the column labels ddd2332 0 2,076 Sep-10-2020, 05:58 PM
Last Post: ddd2332
  Difference of two columns in Pandas dataframe zinho 2 3,313 Jun-17-2020, 03:36 PM
Last Post: zinho
  Pandas dataframe columns collapsed in Spyder when printing UniKlixX 2 4,657 Nov-04-2019, 07:00 AM
Last Post: UniKlixX
  [pandas] How to re-arrange DataFrame columns SriMekala 8 4,852 Jun-22-2019, 12:55 AM
Last Post: scidam
  comparing two columns two different files in pandas nuncio 0 2,372 Jun-06-2018, 01:04 PM
Last Post: nuncio
  subtruction of columns in pandas garikhgh0 3 2,606 May-14-2018, 10:31 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020