Feb-01-2020, 10:47 PM
Good evening, kindof a beginner question here on how to properly feed new data (i.e. not seen during the model development) to a binary classification model. I have run my training and test data on several models and have chosen a particular model (Logistic Regression) to proceed with. I kindof took my new data set, divided it into two separate datasets (one dataset with 0s and the other with 1s) and fed each to the model - figuring this is dead wrong?
Any help on the matter would be greatly appreciate. Below is my code attempt:
Any help on the matter would be greatly appreciate. Below is my code attempt:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# load new data eval_file = 'StickyNickel_NewData.csv' X_new, y_new = load_dataset(eval_file) # separate the data between 0 and 1 classes # first non-sticky nickel (class 0) row_ix = where(y_new = = 0 )[ 0 ] X_new_0 = X_new[row_ix] # second sticky nickel (class 1) row_ix = where(y_new = = 1 )[ 0 ] X_new_1 = X_new[row_ix] results = list () # predict non-sticky cases yhat = model.predict_proba(X_new_0) mean_0 = yhat.mean( 0 )[ 0 ] print (mean_0) results.append(yhat[:, 0 ]) # predict stick_cases yhat = model.predict_proba(X_new_1) mean_1 = yhat.mean( 0 )[ 0 ] print (mean_1) results.append(yhat[:, 0 ]) pyplot.boxplot(results, labels = [ '0' , '1' ]) pyplot.show() |