Python Forum
How to define train set and test set
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to define train set and test set
#1
Hi,
I am using random forest method for regression,

I sue below comment:
X_train,X_test,Y_train,Y_test=train_split(x,y,test_size=0.3,random_state=0)

With above comment, it is splitting randomly, but I want take first 70% as train test, and next 30% as test ,

How to do this,
Reply
#2
I assume you're using sklearn here.

The train_test_split method randomly breaks up your data; that is its purpose. By specifying random_state=0, you will always get the same output for the same input. If your data is already in a form you want, you can just split it up yourself using splicing and what-not.
Reply
#3
Yes, I am using sklearn
my definition as below:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=0)
My data size is 1000, and I want to split first 700 as train data and next 300 data as test data,
But using above comment, it splitting randomly,
Reply
#4
(Mar-05-2018, 01:39 PM)Raj Wrote: Yes, I am using sklearn
my definition as below:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=0)
My data size is 1000, and I want to split first 700 as train data and next 300 data as test data,
But using above comment, it splitting randomly,

As I said, train_test_split() is implemented to break up the data randomly. If you don't want it random, don't use the function. x and y are numpy arrays, correct? If yes, you can just slice them: https://docs.scipy.org/doc/numpy/referen...exing.html
Reply
#5
Sir,

I can not get an example, do you have any precise command(code)to do this?
Reply
#6
Here's a simple example of slicing a numpy array...
>>> import numpy as np
>>> dataset = np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8]])
>>> dataset
array([[1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6],
       [4, 5, 6, 7],
       [5, 6, 7, 8]])
>>> np.shape(dataset)
(5, 4)
>>> train_data = dataset[:3]
>>> train_data
array([[1, 2, 3, 4],
       [2, 3, 4, 5],
       [3, 4, 5, 6]])
>>> test_data = dataset[3:]
>>> test_data
array([[4, 5, 6, 7],
       [5, 6, 7, 8]])
Reply
#7
OK, Thanks.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is my train and test accuracy so low? python420 0 2,032 Dec-08-2019, 08:51 PM
Last Post: python420
  Partitioning when splitting data into train and test-dataset Den0st 0 1,940 Dec-07-2019, 08:31 PM
Last Post: Den0st
  Need help; iris-train Karin 2 2,634 Apr-12-2019, 02:16 AM
Last Post: Karin

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020