Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing data in a plot
For a project I needed to calculate and display a 10fold cross-validation on a time-series. After plotting my results look like this:

As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:

Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
Train 1,2,3,4,5,6,7,8,9 - Test 10

My code looks like this:

tscv = TimeSeriesSplit(n_splits=10

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']


# Cross-validation
for train_index, test_index in tscv.split(X):
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index],y_train)
   rmse.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))  
   print('RMSE: %.10f' % np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

# Plotting
fig, axes = pl.subplots()
pl.plot(prediction, color='red',label='Predicted',)
pl.setp(axes, xticks=[51,98,145,192,239,286,333,380,427,474,521], xticklabels=['          1','          2','          3', '          4','          5','          6','          7','          8','          9','          10'])

prediction = prediction[:,0]
y = y[:,0]

m, b = np.polyfit(prediction, y, 1)


pl.plot(prediction, y,'ro')
pl.plot(prediction, m*prediction + b)
Now my question: Is it possible to remove the first fold (Train 0 - Test 1) before plotting?

Thanks in advance!
Remove first elements from prediction (and y) with slicing when plotting. You can get length of first split either by directly computing it with
skip_size = len(X) - 10 * (len(X) // (10 + 1))   # for n_splits=10
or by using tcsv.split again (or you could do it in your for loop first iteration ...)
skip_size = len(next(tscv.split(X)[0]))
After that its just
pl.plot(prediction[skip_size:], y[skip_size:], 'ro')
Your plot is not piecewise linear, so it seems that your time series is not a time series (= data points in time order).
Thank you very much!

Is there a difference between the first and the second approach? When I try to implement the second one into my loop I get this message:

TypeError: 'generator' object is not subscriptable
Sorry for late reply, there was misplaced ), it should be:
skip_size = len(next(tscv.split(X))[0])
tcsv.split(X) returns generator object; calling next on it returns tuple of arrays containing indices of first train and test split. We want size of first train split, so [0] is used to extract train split.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  How do I make my plot show up? ThomasM4 0 117 Dec-09-2018, 05:28 AM
Last Post: ThomasM4
  Tips on surface plot HW question Cwcox 1 181 Nov-14-2018, 11:45 PM
Last Post: Larz60+
  How to customize x axis in matplotlib.pyplot for a scatter plot? wlsa 9 354 Nov-10-2018, 01:32 AM
Last Post: wlsa
  Adding text to plot Pythcoronas 1 255 Sep-13-2018, 12:27 PM
Last Post: baby_quant
  Adding and Removing coins to match Coin Bag Total infinite times Strayfe 8 475 Sep-11-2018, 07:30 PM
Last Post: gruntfutuk
  plot the mean in a bar diagram sussii 3 387 May-23-2018, 11:27 PM
Last Post: wavic
  Removing dublicates from a string JoeNancy 6 471 May-20-2018, 12:55 PM
Last Post: JoeNancy
  Login Module Help - Comparing data in a text file to data held in a variable KeziaKar 0 407 Mar-08-2018, 11:41 AM
Last Post: KeziaKar
  How to plot date series in matplotlib? StrybolData 2 1,630 Jan-25-2018, 07:13 PM
Last Post: StrybolData
  Removing string within string fivestar 2 724 Oct-20-2017, 04:30 AM
Last Post: Skaperen

Forum Jump:

Users browsing this thread: 1 Guest(s)