Python Forum

Full Version: sklearn and train_test_split
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey everyone,

Could someone better explain train_test_split and what it's actually doing?

So from my understanding:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
When you give TTS an X variable, that is the dataset and y is what you are trying to predict, correct? And TTS will split up your dataset and assign the variables X_train, X_test, y_train, and y_test at its own will to train the dataset? And the test size is how big the test sampling will be from your dataset to test on? Or am I completely off the mark?
Your dataset has X and y variables in it, say days and CV-19 cases. These are all known values.
You send it to TTS, then train on the train values. Once trained, you can validate your model by testing the model against the test set. So, when it predicts that day 27 you should have 500 cases and the actual value is 600, you have an error of 100. You can then get statistics (typically the mse - mean squared error) on how close your model actually predicts reality.

Once you are comfortable with your model you can use it to predict new values - what will we be looking at on day 300?