Sep-02-2022, 09:20 PM
Hello Comrades, as someone who is very new to python, I keep learning and always trying new stuff each day. Here is a code I have been able to put together in my attempt to evaluating the performance of a machine learning algorithm using the resampling approach. Please lend me some few minutes of your precious time to help review it and improve on it for me. Very much thanks in advance.
# Evaluate using a train and a test set: Method 1 import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression filename = 'pima-indians-diabetes.data.csv' names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = pd.read_csv(filename, names=names) array = dataframe.values # Separate data into X and Y components X = array[:,0:8] Y = array[:,8] # Splitting data into train and test test_size = 0.33 seed = 7 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=seed) model = LogisticRegression(solver='newton-cg') model.fit(X_train, Y_train) result = model.score(X_test, Y_test) print('Accuracy: %.3f%%' % (result * 100.0))
# Evaluate using K-fold Cross Validation: Method 2 import pandas as pd import numpy as np from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression filename = 'pima-indians-diabetes.data.csv' names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] dataframe = pd.read_csv(filename, names=names) array = dataframe.values # Separate the data into X and Y components X = array[:,0:8] y = array[:,8] # Setting up the validation parameters num_folds = 10 seed = 7 kfold = KFold(n_splits=num_folds, shuffle=True, random_state=seed) model = LogisticRegression(solver='newton-cg') results = cross_val_score(model, X, y, cv=kfold) print('Accuracy: %.3f%% (%.3f%%)' % (results.mean() * 100.0, results.std() * 100))