![]() |
Simple pandas question - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Simple pandas question (/thread-35641.html) |
Simple pandas question - mcva - Nov-25-2021 I´m using the following code to estimate the moving average for each column (year X) of a dataframe considering a window (pandas.DataFrame.rolling first parameter) that varies from 1 to 100. The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve. Can this code be improved to handle the increase of the dataframe rows? Thank you # Import data data = pd.read_excel('moving_average.xlsx', index_col=0, header=0) data.columns = ['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9'] # Add filtered data tmp1 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(1, center=False, axis=0).mean() tmp1.columns = ['year1Filter_1', 'year2Filter_1', 'year3Filter_1','year4Filter_1','year5Filter_1','year6Filter_1','year7Filter_1','year8Filter_1','year9Filter_1'] tmp2 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(2, center=False, axis=0).mean() tmp2.columns = ['year1Filter_2', 'year2Filter_2', 'year3Filter_2','year4Filter_2','year5Filter_2','year6Filter_2','year7Filter_2','year8Filter_2','year9Filter_2'] tmp3 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(3, center=False, axis=0).mean() tmp3.columns = ['year1Filter_3', 'year2Filter_3', 'year3Filter_3','year4Filter_3','year5Filter_3','year6Filter_3','year7Filter_3','year8Filter_3','year9Filter3'] .... tmp100 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(100, center=False, axis=0).mean() tmp3.columns = ['year1Filter_100', 'year2Filter_100', 'year3Filter_100','year4Filter_100','year5Filter_100','year6Filter_100','year7Filter_100','year8Filter_100','year9Filter100'] data = pd.concat((data, tmp1, tmp2,tmp3,...tmp100), axis=1) sel1 = data.columns.map(lambda x: bool(re.search('year1',x))) out1 = data[data.columns[sel1]] sel2 = data.columns.map(lambda x: bool(re.search('year2',x))) out2 = data[data.columns[sel2]] sel3 = data.columns.map(lambda x: bool(re.search('year3',x))) out3 = data[data.columns[sel3]] ... sel9 = data.columns.map(lambda x: bool(re.search('year9',x))) out3 = data[data.columns[sel3]] #Export results to Excel writer = pd.ExcelWriter('year1.xlsx') out1.to_excel(writer,'data') writer.save() ... writer = pd.ExcelWriter('year9.xlsx') out9.to_excel(writer,'data') writer.save() RE: Simple pandas question - jefsummers - Nov-30-2021 Yes, a bit messy and I am sure can be simplified, but it is unclear to me what you want to do. Can you explain better (words, I see the code)? RE: Simple pandas question - mcva - Nov-30-2021 (Nov-30-2021, 03:13 AM)jefsummers Wrote: Yes, a bit messy and I am sure can be simplified, but it is unclear to me what you want to do. Can you explain better (words, I see the code)? I have a dataframe with 1000 columns all with the same length (365 values), and I need to calculate de moving average of each column. But I need to do this for diferent periods, from 1 to 100. A period is the pandas.DataFrame.rolling function first parameter. So for each of the 1000 columns (years) I need to get 100 new datasets (They will have a diferent length with the increase of the moving average period, this is the reason why I can´t use a numpy array instead of a dataframe) In my code I´m doing the following: 1) tm1, tm2 and tm3 are the moving averages considering a period of 1, 2 and 3 for each of the nine columns. I need to do this until I reach tm100... 2) Then I´m using concat() to create a single dataframe. 3) I´m using re.search() to reorganize the columns by year. 4) Export to excel. Thank you RE: Simple pandas question - jefsummers - Dec-05-2021 Use a list of dataframes, tm[0..100] rather than 100 variables. Will be way easier to maintain. You can still use all of your other methods, but put the process into loop or list comprehensions - will be shorter, easier to maintain, and faster. RE: Simple pandas question - mcva - Dec-17-2021 This is a good solution... import pandas as pd import numpy as np # Import data data = pd.read_excel('moving_average.xlsx', index_col=0, header=0) results = [] for i in range(100): tmp = data.rolling(i+1, center=False, axis=0).mean() results.append(tmp) df=pd.concat(results, axis=1) df.to_excel("output.xlsx") |