Python Forum

I'm working on a homework assignment to predict hospital readmissions based on a dataset my school provided. I found a similar walkthrough (https://medium.com/@awlong20/i-added-the...006defd960) that uses a different dataset, but is similar enough I believe that it will help.

However, K nearest neighbors and Logistic REgression (haven't done others yet) are both showing 1.000. Is this overfitting? If so, what can I do to fix it? I'm happy to share more code if it's helpful. ALso, I've tried the thresh as .50 and .36.

KNN
Training:
AUC:1.000
accuracy:0.851
recall:1.000
precision:0.770
specificity:0.664
prevalence:0.500

Validation:
AUC:1.000
accuracy:0.811
recall:1.000
precision:0.655
specificity:0.670
prevalence:0.360

from sklearn.metrics import roc_auc_score, accuracy_score, precision_score, recall_score
def calc_specificity(y_actual, y_pred, thresh):
    # calculates specificity
    return sum((y_pred < thresh) & (y_actual == 0)) /sum(y_actual ==0)

def print_report(y_actual, y_pred, thresh):
    
    auc = roc_auc_score(y_actual, y_pred)
    accuracy = accuracy_score(y_actual, (y_pred > thresh))
    recall = recall_score(y_actual, (y_pred > thresh))
    precision = precision_score(y_actual, (y_pred > thresh))
    specificity = calc_specificity(y_actual, y_pred, thresh)
    print('AUC:%.3f'%auc)
    print('accuracy:%.3f'%accuracy)
    print('recall:%.3f'%recall)
    print('precision:%.3f'%precision)
    print('specificity:%.3f'%specificity)
    print('prevalence:%.3f'%calc_prevalence(y_actual))
    print(' ')
    return auc, accuracy, recall, precision, specificity

deadendstreet2