Pandas - cumulative sum of two columns

tobbs · May-25-2019, 11:28 AM

Thanks that sorted out the season problem! Yeah you are right, season is what matters actually. However year equeals season in this case.

I realised i did i stupid mistake in my example, i only displayed part of the datafrme showing just two teams. There are more teams (Orebro, Norrkoping and so on..) so in full dataframe the points doesn't calculate correct. I think i can sort this out by myself though.

For example i replace

draws = (df['Result']=='D').where(df['Year'] == year)

with

hd = (df['Result'] == 'D').where((df['H_team'] == team_1) & (df['Year'] == year))
ad = (df['Result'] == 'D').where((df['H_team'] == team_1) & (df['Year'] == year))

and on line 35, 36 i replace "draws" with "hd" and "ad"

I havn't quite figured out how to change

        df['H_points'] = df['H_points'].add(team_1_points.where(df['H_team'] == team_1, team_2_points), fill_value=0)
        df['A_points'] = df['A_points'].add(team_1_points.where(df['A_team'] == team_1, team_2_points), fill_value=0)

to handle more teams yet, but i guess i need extend the where condition?

Agian, big thanks for the time and the help! :)

michalmonday · (This post was last modified: May-25-2019, 01:56 PM by michalmonday.)

It took some tweaks to get it right, but I hope it's working well now:)

import pandas as pd

def get_team_names(df):
    return set(df['H_team']) # this is assuming that every team plays at least 1 home game


df  = pd.DataFrame([
["Gothenburg", "Malmo", 2010,"A"],
["Malmo","Gothenburg",  2010, "D"],
["Malmo", "Gothenburg", 2010, "A"],
["Gothenburg", "Malmo", 2010, "H"],

["foo", "bar", 2010,"H"],
["bar","foo",  2010, "H"],
["bar", "foo", 2010, "D"],
["foo", "bar", 2010, "H"],

["Gothenburg", "Malmo" ,2011, "D"],
["Gothenburg", "Malmo", 2011, "A"],
["Gothenburg", "Malmo", 2011, "H"],
["Malmo", "Gothenburg", 2011, "A"],

["foo", "bar" ,2011, "A"],
["foo", "bar", 2011, "A"],
["foo", "bar", 2011, "D"],
["bar", "foo", 2011, "H"],


["foo", "bar", 2012, "D"],
["bar", "foo", 2012, "D"],
[ "bar","foo", 2012, "A"],
[ "bar", "foo",2012, "A"],

["Gothenburg", "Malmo", 2012, "H"],
["Malmo", "Gothenburg", 2012, "H"],
[ "Malmo","Gothenburg", 2012, "D"],
[ "Malmo", "Gothenburg",2012, "H"],

["Malmo", "Gothenburg", 2013, "H"],
[ "Gothenburg","Malmo", 2013, "D"],
[ "Malmo", "Gothenburg",2013, "H"],

["bar", "foo", 2013, "D"],
[ "bar","foo", 2013, "H"],
[ "foo", "bar",2013, "A"],
 
])

df.columns = ['H_team', 'A_team', 'Year', 'Result']

df['H_points'] = 0
df['A_points'] = 0

for year in set(df['Year']):
    for team in get_team_names(df):
        draws = (df['Result']=='D').where((df['Year'] == year) & ((df['H_team'] == team) | (df['A_team'] == team)))
        h_wins = (df['Result']=='H').where((df['H_team'] == team) & (df['Year'] == year))
        a_wins = (df['Result']=='A').where((df['A_team'] == team) & (df['Year'] == year))
        team_points = (h_wins.add(a_wins, fill_value=0) * 3).add(draws, fill_value=0).cumsum().shift(1)
        df['H_points'] = df['H_points'].add( team_points.where(df['H_team'] == team), fill_value=0)
        df['A_points'] = df['A_points'].add( team_points.where(df['A_team'] == team), fill_value=0)

    fgi = df.index[(df['Year'] == year)][0]  # first game (of the season) index 
    df.loc[fgi, ('A_points', 'H_points')] = 0 # this is necessary because:
    ''' without these lines:
        23       Malmo  Gothenburg  2012      H       4.0       4.0
        24       Malmo  Gothenburg  2013      H       7.0       4.0

        with these lines:
        23       Malmo  Gothenburg  2012      H       4.0       4.0
        24       Malmo  Gothenburg  2013      H       0.0       0.0
    '''

print(df)

Output:        H_team      A_team  Year Result  H_points  A_points
0   Gothenburg       Malmo  2010      A       0.0       0.0
1        Malmo  Gothenburg  2010      D       3.0       0.0
2        Malmo  Gothenburg  2010      A       4.0       1.0
3   Gothenburg       Malmo  2010      H       4.0       4.0
4          foo         bar  2010      H       0.0       0.0
5          bar         foo  2010      H       0.0       3.0
6          bar         foo  2010      D       3.0       3.0
7          foo         bar  2010      H       4.0       4.0
8   Gothenburg       Malmo  2011      D       0.0       0.0
9   Gothenburg       Malmo  2011      A       1.0       1.0
10  Gothenburg       Malmo  2011      H       1.0       4.0
11       Malmo  Gothenburg  2011      A       4.0       4.0
12         foo         bar  2011      A       0.0       0.0
13         foo         bar  2011      A       0.0       3.0
14         foo         bar  2011      D       0.0       6.0
15         bar         foo  2011      H       7.0       1.0
16         foo         bar  2012      D       0.0       0.0
17         bar         foo  2012      D       1.0       1.0
18         bar         foo  2012      A       2.0       2.0
19         bar         foo  2012      A       2.0       5.0
20  Gothenburg       Malmo  2012      H       0.0       0.0
21       Malmo  Gothenburg  2012      H       0.0       3.0
22       Malmo  Gothenburg  2012      D       3.0       3.0
23       Malmo  Gothenburg  2012      H       4.0       4.0
24       Malmo  Gothenburg  2013      H       0.0       0.0
25  Gothenburg       Malmo  2013      D       0.0       3.0
26       Malmo  Gothenburg  2013      H       4.0       1.0
27         bar         foo  2013      D       0.0       0.0
28         bar         foo  2013      H       1.0       1.0
29         foo         bar  2013      A       1.0       4.0

tobbs · May-25-2019, 08:37 PM

This works perfect! Thank you so much, i really appreciate the time and effort you and other great coders put in here to help us newbs. Hope i can contribute myself someday. Thanks again, i would not have managed this one on my own :)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	iretate over columns in df and calculate euclidean distance with one column in pandas	Pit292	0	3,313	May-09-2021, 06:46 PM Last Post: Pit292
	pandas.to_datetime: Combine data from 2 columns	ju21878436312	1	2,461	Feb-20-2021, 08:25 PM Last Post: perfringo
	Remove extra count columns created by pandas groupby	spyf8	1	2,741	Feb-10-2021, 09:19 AM Last Post: Naheed
	Pandas: summing columns conditional on the column labels	ddd2332	0	2,142	Sep-10-2020, 05:58 PM Last Post: ddd2332
	Difference of two columns in Pandas dataframe	zinho	2	3,383	Jun-17-2020, 03:36 PM Last Post: zinho
	Pandas dataframe columns collapsed in Spyder when printing	UniKlixX	2	4,759	Nov-04-2019, 07:00 AM Last Post: UniKlixX
	[pandas] How to re-arrange DataFrame columns	SriMekala	8	4,961	Jun-22-2019, 12:55 AM Last Post: scidam
	comparing two columns two different files in pandas	nuncio	0	2,411	Jun-06-2018, 01:04 PM Last Post: nuncio
	subtruction of columns in pandas	garikhgh0	3	2,660	May-14-2018, 10:31 AM Last Post: volcano63

Pandas - cumulative sum of two columns

User Panel Messages

Announcements