Pandas - cumulative sum of two columns - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas - cumulative sum of two columns (/thread-18470.html) Pages:
1
2
|
Pandas - cumulative sum of two columns - tobbs - May-19-2019 I want to calculate points for soccer teams. I have points for the hometeam and awayteam for each game and i want to get the sum for previous games. (not include the row i'm standing on) This is a part of the dataframe (i have more columns then displayed in the example): df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo","Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo" ,2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo", 2018, 0, 3], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], [ "Malmo","Gothenburg", 2018, 1, 1], [ "Malmo", "Gothenburg",2018, 0, 3], ])The desired output is: H_team A_team Year H_points A_points H_cumsum A_cumsum 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 1.0 1.0 2 Malmo Gothenburg 2018 0 3 2.0 2.0 3 Gothenburg Malmo 2018 1 1 5.0 2.0 4 Gothenburg Malmo 2018 0 3 6.0 3.0 5 Gothenburg Malmo 2018 1 1 6.0 6.0 6 Gothenburg Malmo 2018 0 3 7.0 7.0 7 Malmo Gothenburg 2018 0 3 10.0 7.0 8 Gothenburg Malmo 2018 1 1 10.0 10.0 9 Malmo Gothenburg 2018 0 3 11.0 11.0 10 Malmo Gothenburg 2018 1 1 11.0 14.0 11 Malmo Gothenburg 2018 0 3 12.0 15.0I can calculate total points using stack, but since i have around 10 columns (all not displayed in this example), the result gets really messy. df.columns = [['Team', 'Team', "Year", 'Points', 'Points'], ['Home', 'Away', 'Year', 'Home', 'Away']] d1 = df.stack() total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum()) df = d1.assign(Total=total).unstack() print(df) Points Team Year Total Away Home Year Away Home Year Away Home Year Away Home Year 0 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 NaN NaN NaN 1 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 1.0 1.0 NaN 2 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 2.0 2.0 NaN 3 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 2.0 5.0 NaN 4 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 3.0 6.0 NaN 5 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 6.0 6.0 NaN 6 3.0 0.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 7.0 7.0 NaN 7 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 7.0 10.0 NaN 8 1.0 1.0 NaN Malmo Gothenburg NaN NaN NaN 2018.0 10.0 10.0 NaN 9 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 11.0 11.0 NaN 10 1.0 1.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 14.0 11.0 NaN 11 3.0 0.0 NaN Gothenburg Malmo NaN NaN NaN 2018.0 15.0 12.0 NaNI have also tried groupby home- and awayteam, but that doesn't give me the total points: df.columns = ['Hometeam', 'Awayteam', "Year", 'homepoints', 'awayponts'] df["Homepoints_tot"] = df.groupby(['Hometeam', "Year"])['homepoints'].transform(lambda x: x.cumsum().shift()) df["Awaypoints_tot"] = df.groupby(['Awayteam', "Year"])['awayponts'].transform(lambda x: x.cumsum().shift()) Hometeam Awayteam Year homepoints awayponts Homepoints_tot Awaypoints_tot 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 NaN NaN 2 Malmo Gothenburg 2018 0 3 1.0 1.0 3 Gothenburg Malmo 2018 1 1 1.0 1.0 4 Gothenburg Malmo 2018 0 3 2.0 2.0 5 Gothenburg Malmo 2018 1 1 2.0 5.0 6 Gothenburg Malmo 2018 0 3 3.0 6.0 7 Malmo Gothenburg 2018 0 3 1.0 4.0 8 Gothenburg Malmo 2018 1 1 3.0 9.0 9 Malmo Gothenburg 2018 0 3 1.0 7.0 10 Malmo Gothenburg 2018 1 1 1.0 10.0 11 Malmo Gothenburg 2018 0 3 2.0 11.0Any ideas how to get the desired output? RE: Pandas - cumulative sum of two columns - Yoriz - May-19-2019 I think you might have your desired output cumulative sums mixed up(or i don't understand), this output is not the same but i think it could be what you are after. import pandas as pd df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo", 2018, 0, 3], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Malmo", "Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ]) df.columns = ['H_team', 'A_team', 'Year', 'H_points', 'A_points'] df['H_cumsum'] = df['H_points'].cumsum().shift(1) df['A_cumsum'] = df['A_points'].cumsum().shift(1) print(df)
RE: Pandas - cumulative sum of two columns - tobbs - May-19-2019 No that's not correct, the desired output is what i described at top. It's a score table for soccer teams. The team get 1 point for draw and 3 for victory. Therefor the cumlative score for Gothenburg at row 11 should be: 1+1+3+1+0+1+0+3+1+3+1 = 15 (shifted 1 row, meaning don't include the 3 points at row 11) RE: Pandas - cumulative sum of two columns - Yoriz - May-19-2019 Ok, i only know little about pandas and nothing about soccer . good luck finding a solution RE: Pandas - cumulative sum of two columns - perfringo - May-19-2019 Pandas is not my cup of tea but isn’t there pandas.cumsum? RE: Pandas - cumulative sum of two columns - tobbs - May-19-2019 Yea, i use cumsum but i can only get either the home points or the away points, like this: df['H_cumsum'] = df['H_points'].cumsum().shift(1) df['A_cumsum'] = df['A_points'].cumsum().shift(1)I just don't know how to add the home points with the away points. RE: Pandas - cumulative sum of two columns - tobbs - May-24-2019 Okey a different approach for the same problem. I have the same results but instead of points i have a value for either home win (H), draw (D) or away win (A). A win is 3 points for the winning team and draw is 1 point to both teams. I need the cumulative points for both the home and the away team. This is the dataframe: df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018,"A"], ["Malmo","Gothenburg", 2018, "D"], ["Malmo", "Gothenburg", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Gothenburg", "Malmo" ,2018, "D"], ["Gothenburg", "Malmo", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Malmo", "Gothenburg", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Malmo", "Gothenburg", 2018, "H"], [ "Malmo","Gothenburg", 2018, "D"], [ "Malmo", "Gothenburg",2018, "H"], ])And this is desired output: Hometeam Awayteam Year Tecken H_points A_points 0 Gothenburg Malmo 2018 D Nan Nan 1 Malmo Gothenburg 2018 D 1 1 2 Malmo Gothenburg 2018 A 2 2 3 Gothenburg Malmo 2018 D 5 2 4 Gothenburg Malmo 2018 A 6 3 5 Gothenburg Malmo 2018 D 6 6 6 Gothenburg Malmo 2018 A 7 7 7 Malmo Gothenburg 2018 A 10 7 8 Gothenburg Malmo 2018 D 10 10 9 Malmo Gothenburg 2018 A 11 11 10 Malmo Gothenburg 2018 D 11 14 11 Malmo Gothenburg 2018 A 12 15Im losing my hair over this, hope someone can give me a hand :) RE: Pandas - cumulative sum of two columns - michalmonday - May-24-2019 I know it's ugly and I'm not experienced with pandas but it works:p import pandas as pd def get_team_names(df): return df[0][0], df[1][0] df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018,"A"], ["Malmo","Gothenburg", 2018, "D"], ["Malmo", "Gothenburg", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Gothenburg", "Malmo" ,2018, "D"], ["Gothenburg", "Malmo", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Malmo", "Gothenburg", 2018, "A"], ["Gothenburg", "Malmo", 2018, "H"], ["Malmo", "Gothenburg", 2018, "H"], [ "Malmo","Gothenburg", 2018, "D"], [ "Malmo", "Gothenburg",2018, "H"], ]) team_1, team_2 = get_team_names(df) df.columns = ['H_team', 'A_team', 'Year', 'Result'] draws = df['Result']=='D' hh = (df['Result']=='H').where(df['H_team'] == team_1) * 3 ha = (df['Result']=='A').where(df['A_team'] == team_1) * 3 ah = (df['Result']=='H').where(df['H_team'] == team_2) * 3 aa = (df['Result']=='A').where(df['A_team'] == team_2) * 3 team_1_points = hh.add(ha, fill_value=0).add(draws, fill_value=0).cumsum().shift(1) team_2_points = ah.add(aa, fill_value=0).add(draws, fill_value=0).cumsum().shift(1) df['H_points'] = team_1_points.where(df['H_team'] == team_1, team_2_points) df['A_points'] = team_1_points.where(df['A_team'] == team_1, team_2_points) print(df)
RE: Pandas - cumulative sum of two columns - tobbs - May-25-2019 Yes, that works! Thank you so much, i've been struggling with this one. I need to make a change though, the points need to be reset every year since i'm reading data from 4-5 years back. Is there a way to either group by year or maybe use reset so that the points starts at zero for a new year+ 2016, 2017, 2018 and so on... RE: Pandas - cumulative sum of two columns - michalmonday - May-25-2019 Glad I could help, btw I think it would be important to make sure that the 'Year' value actually groups games from particular season, not strictly a year. import pandas as pd def get_team_names(df): return df[0][0], df[1][0] df = pd.DataFrame([ ["Gothenburg", "Malmo", 2010,"A"], ["Malmo","Gothenburg", 2010, "D"], ["Malmo", "Gothenburg", 2010, "A"], ["Gothenburg", "Malmo", 2010, "H"], ["Gothenburg", "Malmo" ,2011, "D"], ["Gothenburg", "Malmo", 2011, "A"], ["Gothenburg", "Malmo", 2011, "H"], ["Malmo", "Gothenburg", 2011, "A"], ["Gothenburg", "Malmo", 2012, "H"], ["Malmo", "Gothenburg", 2012, "H"], [ "Malmo","Gothenburg", 2012, "D"], [ "Malmo", "Gothenburg",2012, "H"], ["Malmo", "Gothenburg", 2013, "H"], [ "Malmo","Gothenburg", 2013, "D"], [ "Malmo", "Gothenburg",2013, "H"], ]) team_1, team_2 = get_team_names(df) df.columns = ['H_team', 'A_team', 'Year', 'Result'] df['H_points'] = 0 df['A_points'] = 0 for year in set(df['Year']): draws = (df['Result']=='D').where(df['Year'] == year) hh = (df['Result']=='H').where((df['H_team'] == team_1) & (df['Year'] == year)) * 3 ha = (df['Result']=='A').where((df['A_team'] == team_1) & (df['Year'] == year)) * 3 ah = (df['Result']=='H').where((df['H_team'] == team_2) & (df['Year'] == year)) * 3 aa = (df['Result']=='A').where((df['A_team'] == team_2) & (df['Year'] == year)) * 3 team_1_points = hh.add(ha, fill_value=0).add(draws, fill_value=0).cumsum().shift(1) team_2_points = ah.add(aa, fill_value=0).add(draws, fill_value=0).cumsum().shift(1) df['H_points'] = df['H_points'].add( team_1_points.where(df['H_team'] == team_1, team_2_points), fill_value=0) df['A_points'] = df['A_points'].add( team_1_points.where(df['A_team'] == team_1, team_2_points), fill_value=0) # set first game of season to 0 points - https://stackoverflow.com/a/37911231/4620679 # otherwise it would be equal to the sum of points from previous season (due to ".shift(1)" method) df.loc[df.index[df['Year'] == year][:1], 'H_points'] = 0 df.loc[df.index[df['Year'] == year][:1], 'A_points'] = 0 print(df)
|