May-04-2017, 09:16 PM
You do not show your input data, so its hard to say what is going wrong. As sklearn works with numpy arrays, indices from
You probably want to put model fitting/predicting and RMSE inside your for loop, so you use your folds and train 9? models (now all you do is prediction/evaluation on last fold). And for time series its not exactly 90% train data, 10% test data - as you generally want to avoid prediction based on "future" data, for testing on k-th fold you can use only data from fold 1 to fold k-1. So first train fold is first 10% of data, first test fold is following 10% of data, second train fold is first 20% of data, second test fold is following 10% of data and so on.
tcsv.split()
will be integers in the range(0, len(df))
(assuming that tcsv is instance of sklearn.model_selection.TimeSeriesSplit
). If your dataframe has index with different values than 0...len(df)-1, then subsetting with test_index or train_index would lead to NaN for some index values.You probably want to put model fitting/predicting and RMSE inside your for loop, so you use your folds and train 9? models (now all you do is prediction/evaluation on last fold). And for time series its not exactly 90% train data, 10% test data - as you generally want to avoid prediction based on "future" data, for testing on k-th fold you can use only data from fold 1 to fold k-1. So first train fold is first 10% of data, first test fold is following 10% of data, second train fold is first 20% of data, second test fold is following 10% of data and so on.