Basic data analysis and predictions

jefsummers · Mar-07-2020, 03:54 PM

Standard with small datasets is 80-20 train and test. If you want to do train, validate, and test it would be more like 60-20-20. Recognize that you are not supposed to adjust the parameters to fix predictions on your test set, rather train on the train, see the results on validation and go back to adjust (avoid overfitting, etc) and when done prove you did a good job by running the predictions on your test set. Small set this may be hard, so you may have to compromise some and just use validation or test, though you will need to explain that in your paper.
So here is an example from one of my projects:

trainval_dataset = df.sample(frac=0.8,random_state=42)
test_dataset = df.drop(trainval_dataset.index)
train_dataset = trainval_dataset.sample(frac=0.8, random_state=42)
validate_dataset = trainval_dataset.drop(train_dataset.index)
print(f"Train {train_dataset.shape} Validate {validate_dataset.shape} Test {test_dataset.shape}")

trainval_dataset is the training and validation sets, with test_dataset as the test set (what remains from the total after removing the trainval). Then split trainval into training and validation. So, get 3 sets.
Seed of 42 is traditional, and besides being the answer to life, the universe, and everything carries no meaning.

So for you, you really just have 2 columns in your dataframe - year and population. Do the split, then take the year column as X and the population column as Y, and plot it. If it looks linear, do a linear regression. If it does not look linear consider polynomial.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Multivariate Analysis of Ecology Data	Will_Robertson	2	1,000	Aug-04-2023, 11:19 AM Last Post: jefsummers
	Neural network and data analysis from clients survey result	pthon3	2	1,948	Mar-17-2022, 02:21 AM Last Post: jefsummers
	HELP- DATA FRAME INTO TIME SERIES- BASIC	bntayfur	0	1,783	Jul-11-2020, 09:04 PM Last Post: bntayfur
	How to save predictions made by an autoencoder	Glasgow1988	0	1,599	Jul-03-2020, 12:43 PM Last Post: Glasgow1988
	Easy analysis of Data	ranjjeetk	1	1,963	Jun-06-2020, 01:44 AM Last Post: Larz60+
	Utilize input predictions for Supervised Learning	donnertrud	2	1,972	May-20-2020, 12:45 PM Last Post: donnertrud
	complex survey data analysis	abeshkc	1	2,857	Nov-06-2019, 06:14 AM Last Post: ThomasL
	Merge Predictions with whole data set	mayanksrivastava	0	3,645	Jun-29-2017, 11:39 AM Last Post: mayanksrivastava

Basic data analysis and predictions

User Panel Messages

Announcements