Oct-16-2020, 11:09 AM
Hello,
I am frequently required to calculate stuff for a number of dataframe columns, but with totals included.
I know I can ignore a specific dimension using the transform function (https://stackoverflow.com/questions/3094...-in-pandas)
But this adds a second column, while I want extra rows with totals for a non-specified number of columns.
I currently use an expensive loop (cf below "for e in grpby"), but this causes the runtime to explode dramatically.
Below is a simplified example of what I do. Does anyone know a more pythonic way of adding those totals?
Any help is vastly appreciated!
Regards,
Mikis
I am frequently required to calculate stuff for a number of dataframe columns, but with totals included.
I know I can ignore a specific dimension using the transform function (https://stackoverflow.com/questions/3094...-in-pandas)
But this adds a second column, while I want extra rows with totals for a non-specified number of columns.
I currently use an expensive loop (cf below "for e in grpby"), but this causes the runtime to explode dramatically.
Below is a simplified example of what I do. Does anyone know a more pythonic way of adding those totals?
Any help is vastly appreciated!
Regards,
Mikis
import pandas as pd region = ['france', 'france', 'france', 'france', 'uk', 'uk', 'uk', 'uk'] illness = ['pain', 'pain', 'no pain', 'no pain', 'pain', 'pain', 'no pain', 'no pain'] respondent = [1, 2, 3, 1, 2, 3, 1, 2] numbers = [1, 2, 3, 4, 5, 6, 7, 8] data = list(zip(region, illness, respondent, numbers)) columns = ['region', 'illness', 'respondent', 'numbers'] df = pd.DataFrame(data = data, columns=(columns)) grpby = ['region', 'illness'] df_scores = df.groupby(grpby) \ .agg({'numbers':'mean', 'respondent':pd.Series.nunique}) \ .rename(columns={'numbers':'score', 'respondent':"sample"}) \ .reset_index() #But I also want the total of region & illness (cf next) #------------ THIS IS WHAT I DO RIGHT NOW # Add all those totals within a loop (not very pythonic) df_totals = df.copy() for e in grpby: tmp = df_totals.copy() tmp[e] = 'All' df_totals = pd.concat([df_totals, tmp], ignore_index=True) # And then group by, which gives me the outcome I want df_scores_total = df_totals.groupby(grpby) \ .agg({'numbers':'mean', 'respondent':pd.Series.nunique}) \ .rename(columns={'numbers':'score', 'respondent':"sample"}) \ .reset_index()