Simple pandas question - Printable Version

Simple pandas question - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Simple pandas question (/thread-35641.html)

Simple pandas question - mcva - Nov-25-2021

I´m using the following code to estimate the moving average for each column (year X) of a dataframe considering a window (pandas.DataFrame.rolling first parameter) that varies from 1 to 100.
The code is fine but when I need to increase the number of columns in the dataframe I have to include a lot of new lines, hence the probability of making a mistake increases considerably. I have tried to solve this using a numpy array but I came across a very different problem that i was not able to solve.

Can this code be improved to handle the increase of the dataframe rows?
Thank you

# Import data
data = pd.read_excel('moving_average.xlsx', index_col=0, header=0)
data.columns = ['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']


# Add filtered data
tmp1 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(1, center=False, axis=0).mean()
tmp1.columns = ['year1Filter_1', 'year2Filter_1', 'year3Filter_1','year4Filter_1','year5Filter_1','year6Filter_1','year7Filter_1','year8Filter_1','year9Filter_1']

tmp2 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(2, center=False, axis=0).mean()
tmp2.columns = ['year1Filter_2', 'year2Filter_2', 'year3Filter_2','year4Filter_2','year5Filter_2','year6Filter_2','year7Filter_2','year8Filter_2','year9Filter_2']

tmp3 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(3, center=False, axis=0).mean()
tmp3.columns = ['year1Filter_3', 'year2Filter_3', 'year3Filter_3','year4Filter_3','year5Filter_3','year6Filter_3','year7Filter_3','year8Filter_3','year9Filter3']

....
tmp100 = data.loc[:,['year1','year2','year3','year4', 'year5','year6','year7', 'year8', 'year9']].rolling(100, center=False, axis=0).mean()
tmp3.columns = ['year1Filter_100', 'year2Filter_100', 'year3Filter_100','year4Filter_100','year5Filter_100','year6Filter_100','year7Filter_100','year8Filter_100','year9Filter100']

data = pd.concat((data, tmp1, tmp2,tmp3,...tmp100), axis=1)

sel1 = data.columns.map(lambda x: bool(re.search('year1',x)))
out1 = data[data.columns[sel1]]

sel2 = data.columns.map(lambda x: bool(re.search('year2',x)))
out2 = data[data.columns[sel2]]

sel3 = data.columns.map(lambda x: bool(re.search('year3',x)))
out3 = data[data.columns[sel3]]
...
sel9 = data.columns.map(lambda x: bool(re.search('year9',x)))
out3 = data[data.columns[sel3]]

#Export results to Excel
writer = pd.ExcelWriter('year1.xlsx')
out1.to_excel(writer,'data')
writer.save()
...
writer = pd.ExcelWriter('year9.xlsx')
out9.to_excel(writer,'data')
writer.save()

RE: Simple pandas question - jefsummers - Nov-30-2021

Yes, a bit messy and I am sure can be simplified, but it is unclear to me what you want to do. Can you explain better (words, I see the code)?

RE: Simple pandas question - mcva - Nov-30-2021

(Nov-30-2021, 03:13 AM)jefsummers Wrote: Yes, a bit messy and I am sure can be simplified, but it is unclear to me what you want to do. Can you explain better (words, I see the code)?

I have a dataframe with 1000 columns all with the same length (365 values), and I need to calculate de moving average of each column. But I need to do this for diferent periods, from 1 to 100. A period is the pandas.DataFrame.rolling function first parameter.
So for each of the 1000 columns (years) I need to get 100 new datasets (They will have a diferent length with the increase of the moving average period, this is the reason why I can´t use a numpy array instead of a dataframe)

In my code I´m doing the following:

1) tm1, tm2 and tm3 are the moving averages considering a period of 1, 2 and 3 for each of the nine columns. I need to do this until I reach tm100...
2) Then I´m using concat() to create a single dataframe.
3) I´m using re.search() to reorganize the columns by year.
4) Export to excel.

Thank you

RE: Simple pandas question - jefsummers - Dec-05-2021

Use a list of dataframes, tm[0..100] rather than 100 variables. Will be way easier to maintain. You can still use all of your other methods, but put the process into loop or list comprehensions - will be shorter, easier to maintain, and faster.

RE: Simple pandas question - mcva - Dec-17-2021

This is a good solution...

import pandas as pd
import numpy as np

# Import data
data = pd.read_excel('moving_average.xlsx', index_col=0, header=0)

results = []
for i in range(100):
    tmp = data.rolling(i+1, center=False, axis=0).mean()
    results.append(tmp)

df=pd.concat(results, axis=1)

df.to_excel("output.xlsx")