Python Forum
10fold cross-validation on time series
Thread Rating:
  • 2 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
10fold cross-validation on time series
#3
Thanks for your reply! I should've posted the whole code. It looks like this:

import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit

# Preparing data
tweets=pd.read_csv('numTweets.csv', names=['Zeitstempel','Waehrung','AnzahlTweets']) 
prices=pd.read_csv('prices.csv', names=['Zeitstempel','Waehrung','Kurs','Volumen']) 
tweets1 = tweets.dropna(axis=1)
merged = prices.merge(tweets1, on='Zeitstempel')
del merged['Waehrung_y']
merged=merged.rename(columns={'Waehrung_x':'Waehrung'})
# Filter currency
mergedf=merged[(merged.Waehrung == 'BellaCoin')]

tscv = TimeSeriesSplit(n_splits=10)
print(tscv)  

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']

X=X.values.reshape(-1,1)
y=y.values.reshape(-1,1)

for train_index, test_index in tscv.split(X):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

linreg=LinearRegression()
linreg.fit(X_train,y_train)
y_pred=linreg.predict(X_test)
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
After adding the reshape function in line 23,24 I dont get an error message anymore. So I think my first problem is solved. 

If I got this right, the cross validation for time series has to look like this!?

Train: 1 Test: 2
Train: 1,2 Test 3
Train: 1,2,3 Test 4
...


Is the RMSE than calculated as the mean of all the test results and is my approach right then?

Thanks in advance!!!
Reply


Messages In This Thread
RE: 10fold cross-validation on time series - by ulrich48155 - May-05-2017, 01:24 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Fit straight line to pandas time series data with semilog plot schniefen 2 1,579 Mar-10-2023, 01:08 PM
Last Post: jefsummers
  Plot time series data schniefen 3 1,375 Mar-04-2023, 04:22 PM
Last Post: noisefloor
  Help on Time Series problem Kishore_Bill 1 4,874 Feb-27-2020, 09:07 AM
Last Post: Kishore_Bill
  Rookie Stock Prediction Cross Validation using Graeber 3 2,931 Sep-17-2018, 10:40 PM
Last Post: Graeber
  Cross-validation: evaluating estimator performance Grin 1 2,678 Jun-29-2018, 05:15 AM
Last Post: scidam
  help with cross Item97 27 11,593 Nov-28-2017, 09:18 PM
Last Post: Item97
  Visualisation of gaps in time series data ulrich48155 11 19,476 Jul-04-2017, 11:47 PM
Last Post: zivoni

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020