Hi all. I'm a relatively new Python user though I have a bit of experience in R and RapidMiner. Rapidminer has an operator called "Windowing" that essentially takes a time series and converts it into windowed example set.
For example, if I have the following time series:
#index A B C
#date_1 1 2 3
#date_2 4 5 6
#date_3 7 8 9
and I applied Windowing with window_size = 2, I would get the following:
#index A B C A-1 B-1 C-1 A-2 B-2 C-2
#date_1 1 2 3 nan nan nan nan nan nan
#date_2 4 5 6 1 2 3 nan nan nan
#date_3 7 8 9 4 5 6 1 2 3
Documentation for the Rapidminer operator can be found here (https://docs.rapidminer.com/9.2/studio/o...owing.html).
I am trying to find function (preferably via pandas, scipy, etc.) that allows me to complete the same operation in python. So far, the only functions I have found take rolling averages, sums, or other basic statistics of the data (which is not what I'm trying to do).
Any help would be greatly appreciated!
I found a solution. I'll park it here in case anyone needs this in the future:
https://machinelearningmastery.com/conve...em-python/
More specifically, another user in the comments made the adjustment that I found particularly helpful (including column names back in the end result).
Here is the specific function that did the trick:
from pandas import DataFrame
from pandas import concat
import random
def time_series_to_supervised(data, n_lag=1, n_fut=1, selLag=None, selFut=None, dropnan=True):
"""
Converts a time series to a supervised learning data set by adding time-shifted prior and future period
data as input or output (i.e., target result) columns for each period
:param data: a series of periodic attributes as a list or NumPy array
:param n_lag: number of PRIOR periods to lag as input (X); generates: Xa(t-1), Xa(t-2); min= 0 --> nothing lagged
:param n_fut: number of FUTURE periods to add as target output (y); generates Yout(t+1); min= 0 --> no future periods
:param selLag: only copy these specific PRIOR period attributes; default= None; EX: ['Xa', 'Xb' ]
:param selFut: only copy these specific FUTURE period attributes; default= None; EX: ['rslt', 'xx']
:param dropnan: True= drop rows with NaN values; default= True
:return: a Pandas DataFrame of time series data organized for supervised learning
NOTES:
(1) The current period's data is always included in the output.
(2) A suffix is added to the original column names to indicate a relative time reference: e.g., (t) is the current
period; (t-2) is from two periods in the past; (t+1) is from the next period
(3) This is an extension of Jason Brownlee's series_to_supervised() function, customized for MFI use
"""
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
origNames = df.columns
cols, names = list(), list()
# include all current period attributes
cols.append(df.shift(0))
names += [('%s' % origNames[j]) for j in range(n_vars)]
# lag any past period attributes (t-n_lag,...,t-1)
n_lag = max(0, n_lag) # force valid number of lag periods
for i in range(n_lag, 0, -1):
suffix= '(t-%d)' % i
if (None == selLag): # copy all attributes from PRIOR periods?
cols.append(df.shift(i))
names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)]
else:
for var in (selLag):
cols.append(df[var].shift(i))
names+= [('%s%s' % (var, suffix))]
# include future period attributes (t+1,...,t+n_fut)
n_fut = max(n_fut, 0) # force valid number of future periods to shift back
for i in range(1, n_fut + 1):
suffix= '(t+%d)' % i
if (None == selFut): # copy all attributes from future periods?
cols.append(df.shift(-i))
names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)]
else: # copy only selected future attributes
for var in (selFut):
cols.append(df[var].shift(-i))
names += [('%s%s' % (var, suffix))]
# combine everything
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values introduced by lagging
if dropnan:
agg.dropna(inplace=True)
return agg
Thanks to Jason and Michael from machinelearningmastery.com!