10fold cross-validation on time series

***zivoni*** · May-04-2017, 09:16 PM

You do not show your input data, so its hard to say what is going wrong. As sklearn works with numpy arrays, indices from tcsv.split() will be integers in the range(0, len(df)) (assuming that tcsv is instance of sklearn.model_selection.TimeSeriesSplit). If your dataframe has index with different values than 0...len(df)-1, then subsetting with test_index or train_index would lead to NaN for some index values.

You probably want to put model fitting/predicting and RMSE inside your for loop, so you use your folds and train 9? models (now all you do is prediction/evaluation on last fold). And for time series its not exactly 90% train data, 10% test data - as you generally want to avoid prediction based on "future" data, for testing on k-th fold you can use only data from fold 1 to fold k-1. So first train fold is first 10% of data, first test fold is following 10% of data, second train fold is first 20% of data, second test fold is following 10% of data and so on.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Fit straight line to pandas time series data with semilog plot	schniefen	2	1,574	Mar-10-2023, 01:08 PM Last Post: jefsummers
	Plot time series data	schniefen	3	1,373	Mar-04-2023, 04:22 PM Last Post: noisefloor
	Help on Time Series problem	Kishore_Bill	1	4,869	Feb-27-2020, 09:07 AM Last Post: Kishore_Bill
	Rookie Stock Prediction Cross Validation using	Graeber	3	2,924	Sep-17-2018, 10:40 PM Last Post: Graeber
	Cross-validation: evaluating estimator performance	Grin	1	2,674	Jun-29-2018, 05:15 AM Last Post: scidam
	help with cross	Item97	27	11,571	Nov-28-2017, 09:18 PM Last Post: Item97
	Visualisation of gaps in time series data	ulrich48155	11	19,472	Jul-04-2017, 11:47 PM Last Post: zivoni

10fold cross-validation on time series

User Panel Messages

Announcements