data = pd.read_csv('data.csv')
X = data[:,:-1]
Y = data['Outcome']
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3)
model = GaussianNB()
model.fit(X_train,Y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(Y_test,y_pred)
cm = confusion_matrix(y_pred,Y_test)
print(cm)
print(acc)
[
attachment=1850]
for starters, You're not loading the file diabetes.csv
Then, with print diagnostics (still some errors -- left for you to fix):
import pandas as pd
from sklearn.model_selection import train_test_split
import os
# I need next line on my system to show where input file is located (same dir as script)
os.chdir(os.path.abspath(os.path.dirname(__file__)))
data = pd.read_csv('diabetes.csv')
# This will print entire dataframe (with ellipsis)
print(f"All data:\n{data}")
X = data[1:-1]
print(f"\nAll but last row, X:\n{X}")
Y = data['Outcome']
print(f"\nLast column:Y\n{Y}")
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3)
model = GaussianNB()
model.fit(X_train,Y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(Y_test,y_pred)
cm = confusion_matrix(y_pred,Y_test)
print(cm)
print(acc)
Output:
All data:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
.. ... ... ... ... ... ... ... ... ...
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
767 1 93 70 31 0 30.4 0.315 23 0
[768 rows x 9 columns]
All but last row, X:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
5 5 116 74 0 0 25.6 0.201 30 0
.. ... ... ... ... ... ... ... ... ...
762 9 89 62 0 0 22.5 0.142 33 0
763 10 101 76 48 180 32.9 0.171 63 0
764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1
[766 rows x 9 columns]
Last column:Y
0 1
1 0
2 1
3 0
4 1
..
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64
New error:
Error:
Traceback (most recent call last):
File "/media/larz/Projects/projects/QRST/T/TryStuffNew/src/Jul_15_2022_1.py", line 21, in <module>
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3)
File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/model_selection/_split.py", line 2430, in train_test_split
arrays = indexable(*arrays)
File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 433, in indexable
check_consistent_length(*result)
File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 387, in check_consistent_length
raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [766, 768]