Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Removing data in a plot
For a project I needed to calculate and display a 10fold cross-validation on a time-series. After plotting my results look like this:

As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:

Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
Train 1,2,3,4,5,6,7,8,9 - Test 10

My code looks like this:

tscv = TimeSeriesSplit(n_splits=10

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']


# Cross-validation
for train_index, test_index in tscv.split(X):
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index],y_train)
   rmse.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))  
   print('RMSE: %.10f' % np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

# Plotting
fig, axes = pl.subplots()
pl.plot(prediction, color='red',label='Predicted',)
pl.setp(axes, xticks=[51,98,145,192,239,286,333,380,427,474,521], xticklabels=['          1','          2','          3', '          4','          5','          6','          7','          8','          9','          10'])

prediction = prediction[:,0]
y = y[:,0]

m, b = np.polyfit(prediction, y, 1)


pl.plot(prediction, y,'ro')
pl.plot(prediction, m*prediction + b)
Now my question: Is it possible to remove the first fold (Train 0 - Test 1) before plotting?

Thanks in advance!
Remove first elements from prediction (and y) with slicing when plotting. You can get length of first split either by directly computing it with
skip_size = len(X) - 10 * (len(X) // (10 + 1))   # for n_splits=10
or by using tcsv.split again (or you could do it in your for loop first iteration ...)
skip_size = len(next(tscv.split(X)[0]))
After that its just
pl.plot(prediction[skip_size:], y[skip_size:], 'ro')
Your plot is not piecewise linear, so it seems that your time series is not a time series (= data points in time order).
Thank you very much!

Is there a difference between the first and the second approach? When I try to implement the second one into my loop I get this message:

TypeError: 'generator' object is not subscriptable
Sorry for late reply, there was misplaced ), it should be:
skip_size = len(next(tscv.split(X))[0])
tcsv.split(X) returns generator object; calling next on it returns tuple of arrays containing indices of first train and test split. We want size of first train split, so [0] is used to extract train split.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  How to create a new plot in a figure right below the first one after click event? codexx 0 28 Yesterday, 04:45 PM
Last Post: codexx
  How do I make my plot show up? ThomasM4 0 267 Dec-09-2018, 05:28 AM
Last Post: ThomasM4
  Tips on surface plot HW question Cwcox 1 331 Nov-14-2018, 11:45 PM
Last Post: Larz60+
  How to customize x axis in matplotlib.pyplot for a scatter plot? wlsa 9 671 Nov-10-2018, 01:32 AM
Last Post: wlsa
  Adding text to plot Pythcoronas 1 404 Sep-13-2018, 12:27 PM
Last Post: baby_quant
  Adding and Removing coins to match Coin Bag Total infinite times Strayfe 8 690 Sep-11-2018, 07:30 PM
Last Post: gruntfutuk
  plot the mean in a bar diagram sussii 3 538 May-23-2018, 11:27 PM
Last Post: wavic
  Removing dublicates from a string JoeNancy 6 625 May-20-2018, 12:55 PM
Last Post: JoeNancy
  Login Module Help - Comparing data in a text file to data held in a variable KeziaKar 0 539 Mar-08-2018, 11:41 AM
Last Post: KeziaKar
  How to plot date series in matplotlib? StrybolData 2 2,301 Jan-25-2018, 07:13 PM
Last Post: StrybolData

Forum Jump:

Users browsing this thread: 1 Guest(s)