column grouping (sum)

***zivoni*** · Mar-07-2017, 07:15 PM

Yes, there are functions to do it "more directly". Common way is to convert given dataframe to a "narrow" format, where column names become one variable, after that do some transformation and convert it back to a "wide" format, where content of one (or more) column gives new column names. It has some similarity with Excel pivot tables.

There is pandas.melt(), that can be used to convert dataframe from wide to narrow format, and .pivot() or.unstack() can be used to convert narrow to wide (there are other functions too).

Unfortunately even with those functions it could look rather ugly:

Output:In [214]: data = {"state":["New York", "California"], "region":["New York", "Los Angeles"], "2001-01":[123,345], "2001-02":[343,132], "2001-03":[63,423], "2001-04":[393,42]}

In [215]: df = pd.DataFrame(data, columns=["state", "region", "2001-01", "2001-02", "2001-03", "2001-04"])

In [216]: df
Out[216]: 
        state       region  2001-01  2001-02  2001-03  2001-04
0    New York     New York      123      343       63      393
1  California  Los Angeles      345      132      423       42

In [217]: wide = df.drop(['state', 'region'], axis=1).reset_index()
     ...: melted = pd.melt(wide, id_vars='index') 
     ...: melted.variable = melted.variable.apply(lambda x : "{}q{}".format(x[:4], (int(x[5:]) - 1) // 3 + 1))
     ...: grouped = melted.groupby(['index','variable']).sum().unstack()
     ...: grouped.columns = grouped.columns.get_level_values(1)
     ...: df[['state', 'region']].join(grouped)
     ...: 
Out[217]: 
        state       region  2001q1  2001q2
0    New York     New York     529     393
1  California  Los Angeles     900      42

In reality it is not so bad:

cut part to modify
convert it to narrow
aggregate
convert to wide
join with remaining part

With more detail:

Output:In [199]: wide = df.drop(['state', 'region'], axis=1).reset_index()
# as there are columns that wont be transformed, we remove them and copy index as a column to use later for merge
# (it could be done with all columns, but it would be more ugly)

In [200]: wide
Out[200]: 
   index  2001-01  2001-02  2001-03  2001-04
0      0      123      343       63      393
1      1      345      132      423       42

In [201]: melted = pd.melt(wide, id_vars='index') 
# actual conversion to a "narrow" format

In [202]: melted
Out[202]: 
   index variable  value
0      0  2001-01    123
1      1  2001-01    345
2      0  2001-02    343
3      1  2001-02    132
4      0  2001-03     63
5      1  2001-03    423
6      0  2001-04    393
7      1  2001-04     42

In [203]: melted.variable = melted.variable.apply(lambda x : "{}q{}".format(x[:4], (int(x[5:]) - 1) // 3 + 1))
# monthly values are converted to the quartals with .apply 

In [204]: melted
Out[204]: 
   index variable  value
0      0   2001q1    123
1      1   2001q1    345
2      0   2001q1    343
3      1   2001q1    132
4      0   2001q1     63
5      1   2001q1    423
6      0   2001q2    393
7      1   2001q2     42

In [206]: grouped = melted.groupby(['index','variable']).sum()
# summing over the quartals

In [207]: grouped
Out[207]: 
                value
index variable       
0     2001q1      529
      2001q2      393
1     2001q1      900
      2001q2       42

In [208]: grouped = grouped.unstack()
# "pivoting" - "variable" values become new columns

In [209]: grouped
Out[209]: 
          value       
variable 2001q1 2001q2
index                 
0           529    393
1           900     42

In [210]: grouped.columns = grouped.columns.get_level_values(1)
# "Flattening" column multiindex

In [211]: grouped
Out[211]: 
variable  2001q1  2001q2
index                   
0            529     393
1            900      42

In [212]: df[['state', 'region']].join(grouped)
# final merge
Out[212]: 
        state       region  2001q1  2001q2
0    New York     New York     529     393
1  California  Los Angeles     900      42

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Grouping Candidates with same name	coolperson	4	3,058	Jul-12-2019, 07:38 PM Last Post: coolperson
	unicode within a RE grouping	bluefrog	2	3,101	Jun-09-2018, 09:06 AM Last Post: snippsat

column grouping (sum)

User Panel Messages

Announcements