column grouping (sum) - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Homework (https://python-forum.io/forum-9.html) +--- Thread: column grouping (sum) (/thread-2326.html) |
column grouping (sum) - metalray - Mar-07-2017 Dear Python Experts, I am looking again at Daniel Breen's case study In [69]: he writes a function called convert_housing_data_to_quarters(). http://danielbreen.net/projects/housing_prices_college_towns/ The data is providing columns for every months from 1996-04 to 2016q3. He is looping through the data to only consider columns from 2000q1 through 2016q3 and sums them up in quarters. He does that in a nexted loop that looks like it is dealing specifically with q3 and q4 of 2016. for year in range(2000,2017): for quarter in range(1,5): if quarter == 4 and year == 2016: break new_column_name = '{0}q{1}'.format(year, quarter) begin_month = (quarter-1)*3 + 1 end_month = quarter*3 begin_column = '{0}-{1:02d}'.format(year,begin_month) end_column = '{0}-{1:02d}'.format(year,end_month) if quarter == 3 and year == 2016: new_column_name = '2016q3' begin_month = 6 end_month = 8 begin_column = '{0}-{1:02d}'.format(year,begin_month) end_column = '{0}-{1:02d}'.format(year,end_month) data = housing_df.loc[:,begin_column:end_column] housing_df[new_column_name] = data.mean(axis = 1)I wonder if there is an easier way. I would rahter fix the non-existing q3 and q4 problem without a nested loop. It would be really great if someone has a suggestion how to do that. Maybe there is a function I dont know yet. RE: column grouping (sum) - sparkz_alot - Mar-07-2017 It's very difficult to read with all the formatting. When posting between the code tags, try pasting using "CTRL + SHIFT + V", this should remove the formatting portion. RE: column grouping (sum) - zivoni - Mar-07-2017 Yes, there are functions to do it "more directly". Common way is to convert given dataframe to a "narrow" format, where column names become one variable, after that do some transformation and convert it back to a "wide" format, where content of one (or more) column gives new column names. It has some similarity with Excel pivot tables. There is pandas.melt() , that can be used to convert dataframe from wide to narrow format, and .pivot() or.unstack() can be used to convert narrow to wide (there are other functions too).Unfortunately even with those functions it could look rather ugly: In reality it is not so bad:
|