Nov-25-2021, 12:26 PM
I´m using the following code to estimate the moving average for each column (year X) of a dataframe considering a window (pandas.DataFrame.rolling first parameter) that varies from 1 to 100.
The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve.
Can this code be improved to handle the increase of the dataframe rows?
Thank you
The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve.
Can this code be improved to handle the increase of the dataframe rows?
Thank you
# Import data data = pd.read_excel('moving_average.xlsx', index_col=0, header=0) data.columns = ['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9'] # Add filtered data tmp1 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(1, center=False, axis=0).mean() tmp1.columns = ['year1Filter_1', 'year2Filter_1', 'year3Filter_1','year4Filter_1','year5Filter_1','year6Filter_1','year7Filter_1','year8Filter_1','year9Filter_1'] tmp2 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(2, center=False, axis=0).mean() tmp2.columns = ['year1Filter_2', 'year2Filter_2', 'year3Filter_2','year4Filter_2','year5Filter_2','year6Filter_2','year7Filter_2','year8Filter_2','year9Filter_2'] tmp3 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(3, center=False, axis=0).mean() tmp3.columns = ['year1Filter_3', 'year2Filter_3', 'year3Filter_3','year4Filter_3','year5Filter_3','year6Filter_3','year7Filter_3','year8Filter_3','year9Filter3'] .... tmp100 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(100, center=False, axis=0).mean() tmp3.columns = ['year1Filter_100', 'year2Filter_100', 'year3Filter_100','year4Filter_100','year5Filter_100','year6Filter_100','year7Filter_100','year8Filter_100','year9Filter100'] data = pd.concat((data, tmp1, tmp2,tmp3,...tmp100), axis=1) sel1 = data.columns.map(lambda x: bool(re.search('year1',x))) out1 = data[data.columns[sel1]] sel2 = data.columns.map(lambda x: bool(re.search('year2',x))) out2 = data[data.columns[sel2]] sel3 = data.columns.map(lambda x: bool(re.search('year3',x))) out3 = data[data.columns[sel3]] ... sel9 = data.columns.map(lambda x: bool(re.search('year9',x))) out3 = data[data.columns[sel3]] #Export results to Excel writer = pd.ExcelWriter('year1.xlsx') out1.to_excel(writer,'data') writer.save() ... writer = pd.ExcelWriter('year9.xlsx') out9.to_excel(writer,'data') writer.save()