Jun-08-2017, 02:25 PM
For a project I needed to calculate and display a 10fold cross-validation on a time-series. After plotting my results look like this:
http://imgur.com/a/15wbF
As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:
Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
...
Train 1,2,3,4,5,6,7,8,9 - Test 10
My code looks like this:
Thanks in advance!
http://imgur.com/a/15wbF
As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:
Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
...
Train 1,2,3,4,5,6,7,8,9 - Test 10
My code looks like this:
tscv = TimeSeriesSplit(n_splits=10 X = mergedf['AnzahlTweets'] y = mergedf['Kurs'] X=X.values.reshape(-1,1) y=y.values.reshape(-1,1) # Cross-validation linreg=LinearRegression() rmse=[] prediction=np.zeros(y.shape) for train_index, test_index in tscv.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] linreg.fit(X_train,y_train) y_pred=linreg.predict(X_test) prediction[test_index]=y_pred rmse.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred))) print('RMSE: %.10f' % np.sqrt(metrics.mean_squared_error(y_test, y_pred))) # Plotting fig, axes = pl.subplots() pl.plot(y,label='Actual') pl.plot(prediction, color='red',label='Predicted',) pl.ylabel('Price') pl.xlabel('Fold') pl.gca().xaxis.grid(True) pl.setp(axes, xticks=[51,98,145,192,239,286,333,380,427,474,521], xticklabels=[' 1',' 2',' 3', ' 4',' 5',' 6',' 7',' 8',' 9',' 10']) pl.legend() pl.show() prediction = prediction[:,0] y = y[:,0] m, b = np.polyfit(prediction, y, 1) plrange=np.arange(0,0.000001,0.00000005) pl.plot(prediction, y,'ro') pl.plot(prediction, m*prediction + b) pl.xlabel('Predicted') pl.ylabel('Actual') pl.xlim() pl.gca().xaxis.grid(True) pl.show()Now my question: Is it possible to remove the first fold (Train 0 - Test 1) before plotting?
Thanks in advance!