Looking for a Python function like Rapidminer's Windowing operator - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Looking for a Python function like Rapidminer's Windowing operator (/thread-25223.html) |
Looking for a Python function like Rapidminer's Windowing operator - pcartwright - Mar-24-2020 Hi all. I'm a relatively new Python user though I have a bit of experience in R and RapidMiner. Rapidminer has an operator called "Windowing" that essentially takes a time series and converts it into windowed example set. For example, if I have the following time series: #index A B C #date_1 1 2 3 #date_2 4 5 6 #date_3 7 8 9and I applied Windowing with window_size = 2, I would get the following: #index A B C A-1 B-1 C-1 A-2 B-2 C-2 #date_1 1 2 3 nan nan nan nan nan nan #date_2 4 5 6 1 2 3 nan nan nan #date_3 7 8 9 4 5 6 1 2 3Documentation for the Rapidminer operator can be found here (https://docs.rapidminer.com/9.2/studio/operators/modeling/time_series/windowing/windowing.html). I am trying to find function (preferably via pandas, scipy, etc.) that allows me to complete the same operation in python. So far, the only functions I have found take rolling averages, sums, or other basic statistics of the data (which is not what I'm trying to do). Any help would be greatly appreciated! RE: Looking for a Python function like Rapidminer's Windowing operator - pcartwright - Mar-25-2020 I found a solution. I'll park it here in case anyone needs this in the future: https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ More specifically, another user in the comments made the adjustment that I found particularly helpful (including column names back in the end result). Here is the specific function that did the trick: from pandas import DataFrame from pandas import concat import random def time_series_to_supervised(data, n_lag=1, n_fut=1, selLag=None, selFut=None, dropnan=True): """ Converts a time series to a supervised learning data set by adding time-shifted prior and future period data as input or output (i.e., target result) columns for each period :param data: a series of periodic attributes as a list or NumPy array :param n_lag: number of PRIOR periods to lag as input (X); generates: Xa(t-1), Xa(t-2); min= 0 --> nothing lagged :param n_fut: number of FUTURE periods to add as target output (y); generates Yout(t+1); min= 0 --> no future periods :param selLag: only copy these specific PRIOR period attributes; default= None; EX: ['Xa', 'Xb' ] :param selFut: only copy these specific FUTURE period attributes; default= None; EX: ['rslt', 'xx'] :param dropnan: True= drop rows with NaN values; default= True :return: a Pandas DataFrame of time series data organized for supervised learning NOTES: (1) The current period's data is always included in the output. (2) A suffix is added to the original column names to indicate a relative time reference: e.g., (t) is the current period; (t-2) is from two periods in the past; (t+1) is from the next period (3) This is an extension of Jason Brownlee's series_to_supervised() function, customized for MFI use """ n_vars = 1 if type(data) is list else data.shape[1] df = DataFrame(data) origNames = df.columns cols, names = list(), list() # include all current period attributes cols.append(df.shift(0)) names += [('%s' % origNames[j]) for j in range(n_vars)] # lag any past period attributes (t-n_lag,...,t-1) n_lag = max(0, n_lag) # force valid number of lag periods for i in range(n_lag, 0, -1): suffix= '(t-%d)' % i if (None == selLag): # copy all attributes from PRIOR periods? cols.append(df.shift(i)) names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)] else: for var in (selLag): cols.append(df[var].shift(i)) names+= [('%s%s' % (var, suffix))] # include future period attributes (t+1,...,t+n_fut) n_fut = max(n_fut, 0) # force valid number of future periods to shift back for i in range(1, n_fut + 1): suffix= '(t+%d)' % i if (None == selFut): # copy all attributes from future periods? cols.append(df.shift(-i)) names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)] else: # copy only selected future attributes for var in (selFut): cols.append(df[var].shift(-i)) names += [('%s%s' % (var, suffix))] # combine everything agg = concat(cols, axis=1) agg.columns = names # drop rows with NaN values introduced by lagging if dropnan: agg.dropna(inplace=True) return aggThanks to Jason and Michael from machinelearningmastery.com! |