Evaluating the Performance of Machine Learning Algorithms

Evaluating the Performance of Machine Learning Algorithms - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: General (https://python-forum.io/forum-1.html)
+--- Forum: Code sharing (https://python-forum.io/forum-5.html)
+--- Thread: Evaluating the Performance of Machine Learning Algorithms (/thread-38100.html)

Evaluating the Performance of Machine Learning Algorithms - FelixLarry - Sep-02-2022

Hello Comrades, as someone who is very new to python, I keep learning and always trying new stuff each day. Here is a code I have been able to put together in my attempt to evaluating the performance of a machine learning algorithm using the resampling approach. Please lend me some few minutes of your precious time to help review it and improve on it for me. Very much thanks in advance.

# Evaluate using a train and a test set: Method 1
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pd.read_csv(filename, names=names)
array = dataframe.values
# Separate data into X and Y components
X = array[:,0:8]
Y = array[:,8]
# Splitting data into train and test
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
model = LogisticRegression(solver='newton-cg')
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print('Accuracy: %.3f%%' % (result * 100.0))

# Evaluate using K-fold Cross Validation: Method 2
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pd.read_csv(filename, names=names)
array = dataframe.values
# Separate the data into X and Y components
X = array[:,0:8]
y = array[:,8]
# Setting up the validation parameters
num_folds = 10
seed = 7
kfold = KFold(n_splits=num_folds, shuffle=True, random_state=seed)
model = LogisticRegression(solver='newton-cg')
results = cross_val_score(model, X, y, cv=kfold)
print('Accuracy: %.3f%% (%.3f%%)' % (results.mean() * 100.0, results.std() *  100))