Python Forum
Looking for a Python function like Rapidminer's Windowing operator
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looking for a Python function like Rapidminer's Windowing operator
#1
Hi all. I'm a relatively new Python user though I have a bit of experience in R and RapidMiner. Rapidminer has an operator called "Windowing" that essentially takes a time series and converts it into windowed example set.

For example, if I have the following time series:

#index	A	B	C
#date_1	1	2	3
#date_2	4	5	6
#date_3	7	8	9
and I applied Windowing with window_size = 2, I would get the following:

#index	A	B	C	A-1	B-1	C-1	A-2	B-2	C-2
#date_1	1	2	3	nan	nan	nan	nan	nan	nan
#date_2	4	5	6	1	2	3	nan	nan	nan
#date_3	7	8	9	4	5	6	1	2	3
Documentation for the Rapidminer operator can be found here (https://docs.rapidminer.com/9.2/studio/o...owing.html).

I am trying to find function (preferably via pandas, scipy, etc.) that allows me to complete the same operation in python. So far, the only functions I have found take rolling averages, sums, or other basic statistics of the data (which is not what I'm trying to do).

Any help would be greatly appreciated!
Reply
#2
I found a solution. I'll park it here in case anyone needs this in the future:
https://machinelearningmastery.com/conve...em-python/

More specifically, another user in the comments made the adjustment that I found particularly helpful (including column names back in the end result).

Here is the specific function that did the trick:
from pandas import DataFrame
from pandas import concat
import random
 
def time_series_to_supervised(data, n_lag=1, n_fut=1, selLag=None, selFut=None, dropnan=True):
    """
    Converts a time series to a supervised learning data set by adding time-shifted prior and future period
    data as input or output (i.e., target result) columns for each period
    :param data:  a series of periodic attributes as a list or NumPy array
    :param n_lag: number of PRIOR periods to lag as input (X); generates: Xa(t-1), Xa(t-2); min= 0 --> nothing lagged
    :param n_fut: number of FUTURE periods to add as target output (y); generates Yout(t+1); min= 0 --> no future periods
    :param selLag:  only copy these specific PRIOR period attributes; default= None; EX: ['Xa', 'Xb' ]
    :param selFut:  only copy these specific FUTURE period attributes; default= None; EX: ['rslt', 'xx']
    :param dropnan: True= drop rows with NaN values; default= True
    :return: a Pandas DataFrame of time series data organized for supervised learning
    NOTES:
    (1) The current period's data is always included in the output.
    (2) A suffix is added to the original column names to indicate a relative time reference: e.g., (t) is the current
        period; (t-2) is from two periods in the past; (t+1) is from the next period
    (3) This is an extension of Jason Brownlee's series_to_supervised() function, customized for MFI use
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    origNames = df.columns
    cols, names = list(), list()
    # include all current period attributes
    cols.append(df.shift(0))
    names += [('%s' % origNames[j]) for j in range(n_vars)]
 
    # lag any past period attributes (t-n_lag,...,t-1)
    n_lag = max(0, n_lag)  # force valid number of lag periods
    for i in range(n_lag, 0, -1):
        suffix= '(t-%d)' % i
        if (None == selLag):   # copy all attributes from PRIOR periods?
            cols.append(df.shift(i))
            names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)]
        else:
            for var in (selLag):
                cols.append(df[var].shift(i))
                names+= [('%s%s' % (var, suffix))]
 
    # include future period attributes (t+1,...,t+n_fut)
    n_fut = max(n_fut, 0)  # force valid number of future periods to shift back
    for i in range(1, n_fut + 1):
        suffix= '(t+%d)' % i
        if (None == selFut):  # copy all attributes from future periods?
            cols.append(df.shift(-i))
            names += [('%s%s' % (origNames[j], suffix)) for j in range(n_vars)]
        else:  # copy only selected future attributes
            for var in (selFut):
                cols.append(df[var].shift(-i))
                names += [('%s%s' % (var, suffix))]
    # combine everything
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values introduced by lagging
    if dropnan:
        agg.dropna(inplace=True)
    return agg
Thanks to Jason and Michael from machinelearningmastery.com!
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020