10fold cross-validation on time series

ulrich48155 · May-05-2017, 01:24 PM

Thanks for your reply! I should've posted the whole code. It looks like this:

import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit

# Preparing data
tweets=pd.read_csv('numTweets.csv', names=['Zeitstempel','Waehrung','AnzahlTweets']) 
prices=pd.read_csv('prices.csv', names=['Zeitstempel','Waehrung','Kurs','Volumen']) 
tweets1 = tweets.dropna(axis=1)
merged = prices.merge(tweets1, on='Zeitstempel')
del merged['Waehrung_y']
merged=merged.rename(columns={'Waehrung_x':'Waehrung'})
# Filter currency
mergedf=merged[(merged.Waehrung == 'BellaCoin')]

tscv = TimeSeriesSplit(n_splits=10)
print(tscv)  

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']

X=X.values.reshape(-1,1)
y=y.values.reshape(-1,1)

for train_index, test_index in tscv.split(X):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

linreg=LinearRegression()
linreg.fit(X_train,y_train)
y_pred=linreg.predict(X_test)
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

After adding the reshape function in line 23,24 I dont get an error message anymore. So I think my first problem is solved.

If I got this right, the cross validation for time series has to look like this!?

Train: 1 Test: 2
Train: 1,2 Test 3
Train: 1,2,3 Test 4
...

Is the RMSE than calculated as the mean of all the test results and is my approach right then?

Thanks in advance!!!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Fit straight line to pandas time series data with semilog plot	schniefen	2	1,579	Mar-10-2023, 01:08 PM Last Post: jefsummers
	Plot time series data	schniefen	3	1,375	Mar-04-2023, 04:22 PM Last Post: noisefloor
	Help on Time Series problem	Kishore_Bill	1	4,874	Feb-27-2020, 09:07 AM Last Post: Kishore_Bill
	Rookie Stock Prediction Cross Validation using	Graeber	3	2,931	Sep-17-2018, 10:40 PM Last Post: Graeber
	Cross-validation: evaluating estimator performance	Grin	1	2,678	Jun-29-2018, 05:15 AM Last Post: scidam
	help with cross	Item97	27	11,593	Nov-28-2017, 09:18 PM Last Post: Item97
	Visualisation of gaps in time series data	ulrich48155	11	19,476	Jul-04-2017, 11:47 PM Last Post: zivoni

10fold cross-validation on time series

User Panel Messages

Announcements