Python Forum
How to define train set and test set - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to define train set and test set (/thread-8663.html)



How to define train set and test set - Raj - Mar-02-2018

Hi,
I am using random forest method for regression,

I sue below comment:
X_train,X_test,Y_train,Y_test=train_split(x,y,test_size=0.3,random_state=0)

With above comment, it is splitting randomly, but I want take first 70% as train test, and next 30% as test ,

How to do this,


RE: How to define train set and test set - mpd - Mar-02-2018

I assume you're using sklearn here.

The train_test_split method randomly breaks up your data; that is its purpose. By specifying random_state=0, you will always get the same output for the same input. If your data is already in a form you want, you can just split it up yourself using splicing and what-not.


RE: How to define train set and test set - Raj - Mar-05-2018

Yes, I am using sklearn
my definition as below:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=0)
My data size is 1000, and I want to split first 700 as train data and next 300 data as test data,
But using above comment, it splitting randomly,


RE: How to define train set and test set - mpd - Mar-05-2018

(Mar-05-2018, 01:39 PM)Raj Wrote: Yes, I am using sklearn
my definition as below:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=0)
My data size is 1000, and I want to split first 700 as train data and next 300 data as test data,
But using above comment, it splitting randomly,

As I said, train_test_split() is implemented to break up the data randomly. If you don't want it random, don't use the function. x and y are numpy arrays, correct? If yes, you can just slice them: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html


RE: How to define train set and test set - Raj - Mar-07-2018

Sir,

I can not get an example, do you have any precise command(code)to do this?


RE: How to define train set and test set - mpd - Mar-07-2018

Here's a simple example of slicing a numpy array...
>>> import numpy as np
>>> dataset = np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8]])
>>> dataset
array([[1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6],
       [4, 5, 6, 7],
       [5, 6, 7, 8]])
>>> np.shape(dataset)
(5, 4)
>>> train_data = dataset[:3]
>>> train_data
array([[1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6]])
>>> test_data = dataset[3:]
>>> test_data
array([[4, 5, 6, 7],
       [5, 6, 7, 8]])



RE: How to define train set and test set - Raj - Mar-08-2018

OK, Thanks.