Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Removing data in a plot
For a project I needed to calculate and display a 10fold cross-validation on a time-series. After plotting my results look like this:

As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:

Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
Train 1,2,3,4,5,6,7,8,9 - Test 10

My code looks like this:

tscv = TimeSeriesSplit(n_splits=10

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']


# Cross-validation
for train_index, test_index in tscv.split(X):
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index],y_train)
   rmse.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))  
   print('RMSE: %.10f' % np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

# Plotting
fig, axes = pl.subplots()
pl.plot(prediction, color='red',label='Predicted',)
pl.setp(axes, xticks=[51,98,145,192,239,286,333,380,427,474,521], xticklabels=['          1','          2','          3', '          4','          5','          6','          7','          8','          9','          10'])

prediction = prediction[:,0]
y = y[:,0]

m, b = np.polyfit(prediction, y, 1)


pl.plot(prediction, y,'ro')
pl.plot(prediction, m*prediction + b)
Now my question: Is it possible to remove the first fold (Train 0 - Test 1) before plotting?

Thanks in advance!
Remove first elements from prediction (and y) with slicing when plotting. You can get length of first split either by directly computing it with
skip_size = len(X) - 10 * (len(X) // (10 + 1))   # for n_splits=10
or by using tcsv.split again (or you could do it in your for loop first iteration ...)
skip_size = len(next(tscv.split(X)[0]))
After that its just
pl.plot(prediction[skip_size:], y[skip_size:], 'ro')
Your plot is not piecewise linear, so it seems that your time series is not a time series (= data points in time order).
Thank you very much!

Is there a difference between the first and the second approach? When I try to implement the second one into my loop I get this message:

TypeError: 'generator' object is not subscriptable
Sorry for late reply, there was misplaced ), it should be:
skip_size = len(next(tscv.split(X))[0])
tcsv.split(X) returns generator object; calling next on it returns tuple of arrays containing indices of first train and test split. We want size of first train split, so [0] is used to extract train split.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Removing extra space sumncguy 4 129 Jun-07-2019, 09:16 PM
Last Post: sumncguy
  matplotlib : Raster Plot adithyakrish 1 1,296 May-28-2019, 05:04 AM
Last Post: heiner55
  How to create a new plot in a figure right below the first one after click event? codexx 1 110 May-26-2019, 12:52 PM
Last Post: heiner55
  Plot a function dxfrelince 3 124 May-21-2019, 12:23 PM
Last Post: DeaD_EyE
  How do I make my plot show up? ThomasM4 0 305 Dec-09-2018, 05:28 AM
Last Post: ThomasM4
  Tips on surface plot HW question Cwcox 1 370 Nov-14-2018, 11:45 PM
Last Post: Larz60+
  How to customize x axis in matplotlib.pyplot for a scatter plot? wlsa 9 851 Nov-10-2018, 01:32 AM
Last Post: wlsa
  Adding text to plot Pythcoronas 1 443 Sep-13-2018, 12:27 PM
Last Post: baby_quant
  Adding and Removing coins to match Coin Bag Total infinite times Strayfe 8 737 Sep-11-2018, 07:30 PM
Last Post: gruntfutuk
  plot the mean in a bar diagram sussii 3 572 May-23-2018, 11:27 PM
Last Post: wavic

Forum Jump:

Users browsing this thread: 1 Guest(s)