Python Forum

Full Version: Differencing Time series and Inverse after Training
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I got a non-stationary Time Series and I want to predict the target variable in the future. For simplicity, let's say that the target variable is simply a price and the features are economic factors that have an impact on that price. Currently, I am utilizing Random Forest and so far my code looks like that ( nothing special )

# Load Dataset

df = pd.read_csv("C:FINAL.csv")
df["Date"] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df.index.freq = 'MS' 

# Lag variables 

z = df
z = z.shift(4)
z = z.iloc[4:]
df = df.iloc[4:]

# Define variables 

y = df["price"].values
X = z.drop(["price"], axis = 1).values

# Train - Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# RF

rf = RandomForestRegressor(bootstrap=False, max_depth=150, max_features="sqrt", 
      min_samples_leaf=2, min_samples_split=8, n_estimators=100)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
R2 = r2_score(y_test, y_pred)
print('Mean Absolute Error:', mean_absolute_error(y_test, y_pred))
print("R2:", R2)
The results are reasonable, but now I want to take the first difference of the target variable, so

y = y_(t+1) - y(t)

in order to remove the trend of the time series. However, I also want to inverse this operation after training, so I have a MEA I can interpret, Because the MEA will be obviously much lower when caluclating with only the differences. Does anyone know how I can do that conveniently? I don't even know if its possible because the predictions result from forecasting and not differencing itself (i.e. there is no "original" time series for inverse differencing). Thanks in advance !