Python Forum

Full Version: How to split data into trainSet and testSet retaining the index continuous
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I have a dataset and I want to split it into trainSet and testSet in the specified size. I use the below function to split but it is splitting randomly, but I want to split continuously:

X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.2,random_state=0)

if my dataSet as below:
1 2 3
2 4 5
3 2 5
4 5 9
5 6 9
6 1 0
7 5 6
8 0 2
9 7 0
10 1 2
I want to split it into Trainset (80%) and testSet(20%)
My desired TrainSet:
1 2 3
2 4 5
3 2 5
4 5 9
5 6 9
6 1 0
7 5 6
8 0 2

testSet:
9 7 0
10 1 2

Can anyone kindly help how to this,
I'm not sure for what purpose you are building the test and train set. I am working in the field of deep learning and there it is good to pick randomly or to pick every 3. entry.
But nonetheless, for your problem:
you have 10 rows of data so num_rows = 10, 0.2 * num_rows == 2.0 so you know that you have to take the last two rows. Lets consider, that data is a 2-Dimensional numpy array holding your data
data = numpy.array([[1,2,3],[2,4,5],[3,2,5],[4,5,9],[5,6,9],[6,1,0],[7,5,6],[8,0,2],[9,7,0],[10,1,2]])
def train_test_split(data, test_size=0.2, random_state=0):
        num_rows = data.shape[0]
        split_index = (-1) * int(round(test_size * num_rows))
        return data[:split_index,:], data[split_index:,:]
train_data, test_data = train_test_split(data)