How to define train set and test set - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to define train set and test set (/thread-8663.html) |
How to define train set and test set - Raj - Mar-02-2018 Hi, I am using random forest method for regression, I sue below comment: X_train,X_test,Y_train,Y_test=train_split(x,y,test_size=0.3,random_state=0) With above comment, it is splitting randomly, but I want take first 70% as train test, and next 30% as test , How to do this, RE: How to define train set and test set - mpd - Mar-02-2018 I assume you're using sklearn here. The train_test_split method randomly breaks up your data; that is its purpose. By specifying random_state=0 , you will always get the same output for the same input. If your data is already in a form you want, you can just split it up yourself using splicing and what-not.
RE: How to define train set and test set - Raj - Mar-05-2018 Yes, I am using sklearn my definition as below: X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=0) My data size is 1000, and I want to split first 700 as train data and next 300 data as test data, But using above comment, it splitting randomly, RE: How to define train set and test set - mpd - Mar-05-2018 (Mar-05-2018, 01:39 PM)Raj Wrote: Yes, I am using sklearn As I said, train_test_split() is implemented to break up the data randomly. If you don't want it random, don't use the function. x and y are numpy arrays, correct? If yes, you can just slice them: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
RE: How to define train set and test set - Raj - Mar-07-2018 Sir, I can not get an example, do you have any precise command(code)to do this? RE: How to define train set and test set - mpd - Mar-07-2018 Here's a simple example of slicing a numpy array... >>> import numpy as np >>> dataset = np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7],[5,6,7,8]]) >>> dataset array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8]]) >>> np.shape(dataset) (5, 4) >>> train_data = dataset[:3] >>> train_data array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]) >>> test_data = dataset[3:] >>> test_data array([[4, 5, 6, 7], [5, 6, 7, 8]]) RE: How to define train set and test set - Raj - Mar-08-2018 OK, Thanks. |