May-19-2019, 09:29 AM
I want to calculate points for soccer teams. I have points for the hometeam and awayteam for each game and i want to get the sum for previous games. (not include the row i'm standing on) This is a part of the dataframe (i have more columns then displayed in the example):
df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo","Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo" ,2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo", 2018, 0, 3], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], [ "Malmo","Gothenburg", 2018, 1, 1], [ "Malmo", "Gothenburg",2018, 0, 3], ])The desired output is:
H_team A_team Year H_points A_points H_cumsum A_cumsum 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 1.0 1.0 2 Malmo Gothenburg 2018 0 3 2.0 2.0 3 Gothenburg Malmo 2018 1 1 5.0 2.0 4 Gothenburg Malmo 2018 0 3 6.0 3.0 5 Gothenburg Malmo 2018 1 1 6.0 6.0 6 Gothenburg Malmo 2018 0 3 7.0 7.0 7 Malmo Gothenburg 2018 0 3 10.0 7.0 8 Gothenburg Malmo 2018 1 1 10.0 10.0 9 Malmo Gothenburg 2018 0 3 11.0 11.0 10 Malmo Gothenburg 2018 1 1 11.0 14.0 11 Malmo Gothenburg 2018 0 3 12.0 15.0I can calculate total points using stack, but since i have around 10 columns (all not displayed in this example), the result gets really messy.
df.columns = [['Team', 'Team', "Year", 'Points', 'Points'], ['Home', 'Away', 'Year', 'Home', 'Away']] d1 = df.stack() total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum()) df = d1.assign(Total=total).unstack() print(df) Points Team Year Total Away Home Year Away Home Year Away Home Year Away Home Year 0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN 1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN 2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN 3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN 4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN 5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN 6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN 7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN 8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN 9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN 10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN 11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaNI have also tried groupby home- and awayteam, but that doesn't give me the total points:
df.columns = ['Hometeam', 'Awayteam', "Year", 'homepoints', 'awayponts'] df["Homepoints_tot"] = df.groupby(['Hometeam', "Year"])['homepoints'].transform(lambda x: x.cumsum().shift()) df["Awaypoints_tot"] = df.groupby(['Awayteam', "Year"])['awayponts'].transform(lambda x: x.cumsum().shift()) Hometeam Awayteam Year homepoints awayponts Homepoints_tot Awaypoints_tot 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 NaN NaN 2 Malmo Gothenburg 2018 0 3 1.0 1.0 3 Gothenburg Malmo 2018 1 1 1.0 1.0 4 Gothenburg Malmo 2018 0 3 2.0 2.0 5 Gothenburg Malmo 2018 1 1 2.0 5.0 6 Gothenburg Malmo 2018 0 3 3.0 6.0 7 Malmo Gothenburg 2018 0 3 1.0 4.0 8 Gothenburg Malmo 2018 1 1 3.0 9.0 9 Malmo Gothenburg 2018 0 3 1.0 7.0 10 Malmo Gothenburg 2018 1 1 1.0 10.0 11 Malmo Gothenburg 2018 0 3 2.0 11.0Any ideas how to get the desired output?