Python Forum
How to split data into trainSet and testSet retaining the index continuous
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to split data into trainSet and testSet retaining the index continuous
#1
Hi,
I have a dataset and I want to split it into trainSet and testSet in the specified size. I use the below function to split but it is splitting randomly, but I want to split continuously:

X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.2,random_state=0)

if my dataSet as below:
1 2 3
2 4 5
3 2 5
4 5 9
5 6 9
6 1 0
7 5 6
8 0 2
9 7 0
10 1 2
I want to split it into Trainset (80%) and testSet(20%)
My desired TrainSet:
1 2 3
2 4 5
3 2 5
4 5 9
5 6 9
6 1 0
7 5 6
8 0 2

testSet:
9 7 0
10 1 2

Can anyone kindly help how to this,
Reply
#2
I'm not sure for what purpose you are building the test and train set. I am working in the field of deep learning and there it is good to pick randomly or to pick every 3. entry.
But nonetheless, for your problem:
you have 10 rows of data so num_rows = 10, 0.2 * num_rows == 2.0 so you know that you have to take the last two rows. Lets consider, that data is a 2-Dimensional numpy array holding your data
data = numpy.array([[1,2,3],[2,4,5],[3,2,5],[4,5,9],[5,6,9],[6,1,0],[7,5,6],[8,0,2],[9,7,0],[10,1,2]])
def train_test_split(data, test_size=0.2, random_state=0):
        num_rows = data.shape[0]
        split_index = (-1) * int(round(test_size * num_rows))
        return data[:split_index,:], data[split_index:,:]
train_data, test_data = train_test_split(data)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 669 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  How to split monthly data smoothly into days? AlekseyPython 1 1,538 Jan-27-2022, 10:56 AM
Last Post: Larz60+
  [split] Getting Index Error - list index out of range krishna 2 2,610 Jan-09-2021, 08:29 AM
Last Post: buran
  Getting Index Error - list index out of range RahulSingh 2 6,147 Feb-03-2020, 07:17 AM
Last Post: RahulSingh
  How to add data to the categorical index of dataframe as data arrives? AlekseyPython 1 2,346 Oct-16-2019, 06:26 AM
Last Post: AlekseyPython
  Running K-Neighbors: mix of multiclass and continuous targets Gigux 0 3,352 Feb-20-2019, 01:44 PM
Last Post: Gigux
  How to update trainSet on each iteration Raj 6 4,422 May-01-2018, 09:58 AM
Last Post: Raj

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020