Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing data in a plot
#1
For a project I needed to calculate and display a 10fold cross-validation on a time-series. After plotting my results look like this:

http://imgur.com/a/15wbF

As you can see, both plots also contain the first fold, which I circled green. This fold is not noteworthy and I would like to remove it. Due to the fact, that I work with time series data my 10fold cross-validation has this structure:

Train 0 - Test 1
Train 1 - Test 2
Train 1,2 - Test 3
Train 1,2,3 - Tet 4
...
Train 1,2,3,4,5,6,7,8,9 - Test 10

My code looks like this:

tscv = TimeSeriesSplit(n_splits=10

X = mergedf['AnzahlTweets']
y = mergedf['Kurs']

X=X.values.reshape(-1,1)
y=y.values.reshape(-1,1)

# Cross-validation
linreg=LinearRegression()
rmse=[]
prediction=np.zeros(y.shape)
for train_index, test_index in tscv.split(X):
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]
   linreg.fit(X_train,y_train)
   y_pred=linreg.predict(X_test)
   prediction[test_index]=y_pred
   rmse.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))  
   print('RMSE: %.10f' % np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

# Plotting
fig, axes = pl.subplots()
pl.plot(y,label='Actual')
pl.plot(prediction, color='red',label='Predicted',)
pl.ylabel('Price')
pl.xlabel('Fold')
pl.gca().xaxis.grid(True)
pl.setp(axes, xticks=[51,98,145,192,239,286,333,380,427,474,521], xticklabels=['          1','          2','          3', '          4','          5','          6','          7','          8','          9','          10'])
pl.legend()
pl.show()

prediction = prediction[:,0]
y = y[:,0]

m, b = np.polyfit(prediction, y, 1)

plrange=np.arange(0,0.000001,0.00000005)

pl.plot(prediction, y,'ro')
pl.plot(prediction, m*prediction + b)
pl.xlabel('Predicted')
pl.ylabel('Actual')
pl.xlim()
pl.gca().xaxis.grid(True)
pl.show()
Now my question: Is it possible to remove the first fold (Train 0 - Test 1) before plotting?

Thanks in advance!
Quote
#2
Remove first elements from prediction (and y) with slicing when plotting. You can get length of first split either by directly computing it with
skip_size = len(X) - 10 * (len(X) // (10 + 1))   # for n_splits=10
or by using tcsv.split again (or you could do it in your for loop first iteration ...)
skip_size = len(next(tscv.split(X)[0]))
After that its just
pl.plot(y[skip_size:])
...
pl.plot(prediction[skip_size:], y[skip_size:], 'ro')
Your plot is not piecewise linear, so it seems that your time series is not a time series (= data points in time order).
Quote
#3
Thank you very much!

Is there a difference between the first and the second approach? When I try to implement the second one into my loop I get this message:

Error:
TypeError: 'generator' object is not subscriptable
Quote
#4
Sorry for late reply, there was misplaced ), it should be:
skip_size = len(next(tscv.split(X))[0])
tcsv.split(X) returns generator object; calling next on it returns tuple of arrays containing indices of first train and test split. We want size of first train split, so [0] is used to extract train split.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  How do I make my plot show up? ThomasM4 0 116 Dec-09-2018, 05:28 AM
Last Post: ThomasM4
  Tips on surface plot HW question Cwcox 1 180 Nov-14-2018, 11:45 PM
Last Post: Larz60+
  How to customize x axis in matplotlib.pyplot for a scatter plot? wlsa 9 353 Nov-10-2018, 01:32 AM
Last Post: wlsa
  Adding text to plot Pythcoronas 1 254 Sep-13-2018, 12:27 PM
Last Post: baby_quant
  Adding and Removing coins to match Coin Bag Total infinite times Strayfe 8 475 Sep-11-2018, 07:30 PM
Last Post: gruntfutuk
  plot the mean in a bar diagram sussii 3 385 May-23-2018, 11:27 PM
Last Post: wavic
  Removing dublicates from a string JoeNancy 6 471 May-20-2018, 12:55 PM
Last Post: JoeNancy
  Login Module Help - Comparing data in a text file to data held in a variable KeziaKar 0 407 Mar-08-2018, 11:41 AM
Last Post: KeziaKar
  How to plot date series in matplotlib? StrybolData 2 1,627 Jan-25-2018, 07:13 PM
Last Post: StrybolData
  Removing string within string fivestar 2 723 Oct-20-2017, 04:30 AM
Last Post: Skaperen

Forum Jump:


Users browsing this thread: 1 Guest(s)