Python Forum
Partitioning when splitting data into train and test-dataset
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Partitioning when splitting data into train and test-dataset
#1
[Image: mDDYhdn]
In this image you can see a simplified example from how my dataset looks like.

My goal is to create a text-classifier which can be used to predict whether a paragraph from a document has one or more labels. (Multi-label classification) but my very first step is to split the data into train and test-data. The CSV-file with the data contains many paragraphs from multiple documents.
The issue is that I need to make the split on document level to make sure that there are no paragraphs from one document in the train-set and other paragraphs from thesame document in the test-set.

I know how sklearn's train_test_split() works but doing this and also making sure that the documents from the train-set are not present in the test-set is something where i've already done research on but still have no clue about it :/.

Could anyone give me a help in telling me how i can make this happen? I would really appreciate that.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Supervised learning, tree based model - problems splitting data Pixel 0 667 May-16-2023, 05:25 PM
Last Post: Pixel
  How to test likelihood hypothesis on dataset? iiiioooo 0 907 Apr-18-2022, 01:00 PM
Last Post: iiiioooo
  Mann Whitney U-test on several data sets rybina 2 2,091 Jan-05-2021, 03:08 PM
Last Post: rybina
  Using Autoencoder for Data Augmentation of numerical Dataset in Python Marvin93 2 3,364 Jul-10-2020, 07:18 PM
Last Post: Marvin93
  Generate Test data (.csv) using Pandas Ashley 5 3,045 Jun-15-2020, 02:51 PM
Last Post: jefsummers
  Why is my train and test accuracy so low? python420 0 2,051 Dec-08-2019, 08:51 PM
Last Post: python420
  Need help; iris-train Karin 2 2,663 Apr-12-2019, 02:16 AM
Last Post: Karin
  Join Predicted values with test dataset bhuwan 4 10,398 Mar-28-2019, 12:42 AM
Last Post: bhuwan
  Read CSV data into Pandas DataSet From Variable? Oliver 7 13,921 Jul-05-2018, 03:29 AM
Last Post: answerquest
  How to define train set and test set Raj 6 7,863 Mar-08-2018, 01:04 PM
Last Post: Raj

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020