May-05-2017, 01:24 PM
Thanks for your reply! I should've posted the whole code. It looks like this:
If I got this right, the cross validation for time series has to look like this!?
Train: 1 Test: 2
Train: 1,2 Test 3
Train: 1,2,3 Test 4
...
Is the RMSE than calculated as the mean of all the test results and is my approach right then?
Thanks in advance!!!
import pandas as pd import numpy as np from sklearn import metrics from sklearn.linear_model import LinearRegression from sklearn.model_selection import TimeSeriesSplit # Preparing data tweets=pd.read_csv('numTweets.csv', names=['Zeitstempel','Waehrung','AnzahlTweets']) prices=pd.read_csv('prices.csv', names=['Zeitstempel','Waehrung','Kurs','Volumen']) tweets1 = tweets.dropna(axis=1) merged = prices.merge(tweets1, on='Zeitstempel') del merged['Waehrung_y'] merged=merged.rename(columns={'Waehrung_x':'Waehrung'}) # Filter currency mergedf=merged[(merged.Waehrung == 'BellaCoin')] tscv = TimeSeriesSplit(n_splits=10) print(tscv) X = mergedf['AnzahlTweets'] y = mergedf['Kurs'] X=X.values.reshape(-1,1) y=y.values.reshape(-1,1) for train_index, test_index in tscv.split(X): print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] linreg=LinearRegression() linreg.fit(X_train,y_train) y_pred=linreg.predict(X_test) print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_pred)))After adding the reshape function in line 23,24 I dont get an error message anymore. So I think my first problem is solved.
If I got this right, the cross validation for time series has to look like this!?
Train: 1 Test: 2
Train: 1,2 Test 3
Train: 1,2,3 Test 4
...
Is the RMSE than calculated as the mean of all the test results and is my approach right then?
Thanks in advance!!!