Jan-24-2021, 06:19 PM
I'm working on a homework assignment to predict hospital readmissions based on a dataset my school provided. I found a similar walkthrough (https://medium.com/@awlong20/i-added-the...006defd960) that uses a different dataset, but is similar enough I believe that it will help.
However, K nearest neighbors and Logistic REgression (haven't done others yet) are both showing 1.000. Is this overfitting? If so, what can I do to fix it? I'm happy to share more code if it's helpful. ALso, I've tried the thresh as .50 and .36.
KNN
Training:
AUC:1.000
accuracy:0.851
recall:1.000
precision:0.770
specificity:0.664
prevalence:0.500
Validation:
AUC:1.000
accuracy:0.811
recall:1.000
precision:0.655
specificity:0.670
prevalence:0.360
However, K nearest neighbors and Logistic REgression (haven't done others yet) are both showing 1.000. Is this overfitting? If so, what can I do to fix it? I'm happy to share more code if it's helpful. ALso, I've tried the thresh as .50 and .36.
KNN
Training:
AUC:1.000
accuracy:0.851
recall:1.000
precision:0.770
specificity:0.664
prevalence:0.500
Validation:
AUC:1.000
accuracy:0.811
recall:1.000
precision:0.655
specificity:0.670
prevalence:0.360
from sklearn.metrics import roc_auc_score, accuracy_score, precision_score, recall_score def calc_specificity(y_actual, y_pred, thresh): # calculates specificity return sum((y_pred < thresh) & (y_actual == 0)) /sum(y_actual ==0) def print_report(y_actual, y_pred, thresh): auc = roc_auc_score(y_actual, y_pred) accuracy = accuracy_score(y_actual, (y_pred > thresh)) recall = recall_score(y_actual, (y_pred > thresh)) precision = precision_score(y_actual, (y_pred > thresh)) specificity = calc_specificity(y_actual, y_pred, thresh) print('AUC:%.3f'%auc) print('accuracy:%.3f'%accuracy) print('recall:%.3f'%recall) print('precision:%.3f'%precision) print('specificity:%.3f'%specificity) print('prevalence:%.3f'%calc_prevalence(y_actual)) print(' ') return auc, accuracy, recall, precision, specificity