Join Predicted values with test dataset

**scidam** · Mar-26-2019, 02:21 AM

Could you provide full code? The problem is not clear: why do you need to join arrays? you can check their shapes, e.g. using len, or, if they are numpy arrays, via .shape attribute.

Lets consider common steps of verifying a ML model, in general.

1) You have original dataset X and class labels y; Suppose that these arrays have shapes (n, m) and (n, ) respectively (i.e. we have m-features (# of cols) with n-measurements (# of rows) and n desired classes). These classes could be encoded with integer values (some ML frameworks works only with numerical values).

2) We could train our classifier (or model) on X and y, apply the trained model to X and get y_pred with the same shape as y, and compute some accuracy measures, such as precision, recall, accuracy etc. measure_score(y, y_pred) => some value. Unfortunately, doing so, we get overestimated measures of accuracy. This is due to over fitting problem.

3) A common way to overcome the overfitting problem consist in splitting original
dataset (X, y) into two datasets: (X_train, y_train) and (X_test, y_test). Usually, this splitting is performed randomly, e.g. 85 % of rows from X (and correspondingly in y) randomly selected for X_train and y_train, and 15 % are used for X_test, y_test. The first pair (X_train, y_train) is used to train our model. The second, that was not showed to the model, is used for testing: we apply the model to X_test and compare obtained y_pred with y_test; these vectors are of the same size.

So, pseudocode would be the following:

Quote:X, y -- original dataset

(X_train, y_train), (X_test, y_test) = split_data(X, y)

model -- ML-model used to solve classification problem

model.fit(X_train, y_train) --- fitting the model on train data

#From now we have fitted model, and we wish to estimate its accuracy

y_pred = model.predict(X_test) # predict classes on test data

some_accuracy_measure(y_pred, y_test) => float value (usually in [0,1])

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to test likelihood hypothesis on dataset?	iiiioooo	0	933	Apr-18-2022, 01:00 PM Last Post: iiiioooo
	Graph that shows predicted vs true values	donnertrud	1	2,595	Jan-28-2020, 07:47 PM Last Post: jefsummers
	Partitioning when splitting data into train and test-dataset	Den0st	0	2,003	Dec-07-2019, 08:31 PM Last Post: Den0st
	spread values of dataset equally over fixed number of bins	moose_man	3	2,563	Oct-30-2019, 07:41 PM Last Post: ichabod801
	How many unique values of quality are in this dataset?	Jack_Sparrow	1	3,169	May-20-2018, 01:59 PM Last Post: volcano63

Join Predicted values with test dataset

User Panel Messages

Announcements