Nov-25-2021, 12:26 PM
I´m using the following code to estimate the moving average for each column (year X) of a dataframe considering a window (pandas.DataFrame.rolling first parameter) that varies from 1 to 100.
The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve.
Can this code be improved to handle the increase of the dataframe rows?
Thank you
The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve.
Can this code be improved to handle the increase of the dataframe rows?
Thank you
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# Import data data = pd.read_excel( 'moving_average.xlsx' , index_col = 0 , header = 0 ) data.columns = [ 'year1' , 'year2' , 'year3' , 'year4' , 'year5' , 'year6' , 'year7' , 'year8' , 'year9' ] # Add filtered data tmp1 = data.loc[:,[ 'year1' , 'year2' , 'year3' , 'year4' , 'year5' , 'year6' , 'year7' , 'year8' , 'year9' ]].rolling( 1 , center = False , axis = 0 ).mean() tmp1.columns = [ 'year1Filter_1' , 'year2Filter_1' , 'year3Filter_1' , 'year4Filter_1' , 'year5Filter_1' , 'year6Filter_1' , 'year7Filter_1' , 'year8Filter_1' , 'year9Filter_1' ] tmp2 = data.loc[:,[ 'year1' , 'year2' , 'year3' , 'year4' , 'year5' , 'year6' , 'year7' , 'year8' , 'year9' ]].rolling( 2 , center = False , axis = 0 ).mean() tmp2.columns = [ 'year1Filter_2' , 'year2Filter_2' , 'year3Filter_2' , 'year4Filter_2' , 'year5Filter_2' , 'year6Filter_2' , 'year7Filter_2' , 'year8Filter_2' , 'year9Filter_2' ] tmp3 = data.loc[:,[ 'year1' , 'year2' , 'year3' , 'year4' , 'year5' , 'year6' , 'year7' , 'year8' , 'year9' ]].rolling( 3 , center = False , axis = 0 ).mean() tmp3.columns = [ 'year1Filter_3' , 'year2Filter_3' , 'year3Filter_3' , 'year4Filter_3' , 'year5Filter_3' , 'year6Filter_3' , 'year7Filter_3' , 'year8Filter_3' , 'year9Filter3' ] .... tmp100 = data.loc[:,[ 'year1' , 'year2' , 'year3' , 'year4' , 'year5' , 'year6' , 'year7' , 'year8' , 'year9' ]].rolling( 100 , center = False , axis = 0 ).mean() tmp3.columns = [ 'year1Filter_100' , 'year2Filter_100' , 'year3Filter_100' , 'year4Filter_100' , 'year5Filter_100' , 'year6Filter_100' , 'year7Filter_100' , 'year8Filter_100' , 'year9Filter100' ] data = pd.concat((data, tmp1, tmp2,tmp3,...tmp100), axis = 1 ) sel1 = data.columns. map ( lambda x: bool (re.search( 'year1' ,x))) out1 = data[data.columns[sel1]] sel2 = data.columns. map ( lambda x: bool (re.search( 'year2' ,x))) out2 = data[data.columns[sel2]] sel3 = data.columns. map ( lambda x: bool (re.search( 'year3' ,x))) out3 = data[data.columns[sel3]] ... sel9 = data.columns. map ( lambda x: bool (re.search( 'year9' ,x))) out3 = data[data.columns[sel3]] #Export results to Excel writer = pd.ExcelWriter( 'year1.xlsx' ) out1.to_excel(writer, 'data' ) writer.save() ... writer = pd.ExcelWriter( 'year9.xlsx' ) out9.to_excel(writer, 'data' ) writer.save() |