Python Forum

Full Version: InvalidIndexError: (slice(None, None, None), slice(None, -1, None))
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
data = pd.read_csv('data.csv')

X = data[:,:-1]
Y = data['Outcome']

X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3)

model = GaussianNB()

model.fit(X_train,Y_train)

y_pred = model.predict(X_test)

acc = accuracy_score(Y_test,y_pred)
cm = confusion_matrix(y_pred,Y_test)

print(cm)
print(acc)
[attachment=1850]
for starters, You're not loading the file diabetes.csv

Then, with print diagnostics (still some errors -- left for you to fix):
import pandas as pd
from sklearn.model_selection import train_test_split
import os


# I need next line on my system to show where input file is located (same dir as script)
os.chdir(os.path.abspath(os.path.dirname(__file__)))
data = pd.read_csv('diabetes.csv')

# This will print entire dataframe (with ellipsis)
print(f"All data:\n{data}")

X = data[1:-1]
print(f"\nAll but last row, X:\n{X}")

Y = data['Outcome']
print(f"\nLast column:Y\n{Y}")

X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3)
 
model = GaussianNB()
 
model.fit(X_train,Y_train)
 
y_pred = model.predict(X_test)
 
acc = accuracy_score(Y_test,y_pred)
cm = confusion_matrix(y_pred,Y_test)
 
print(cm)
print(acc)
Output:
All data: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome 0 6 148 72 35 0 33.6 0.627 50 1 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1 .. ... ... ... ... ... ... ... ... ... 763 10 101 76 48 180 32.9 0.171 63 0 764 2 122 70 27 0 36.8 0.340 27 0 765 5 121 72 23 112 26.2 0.245 30 0 766 1 126 60 0 0 30.1 0.349 47 1 767 1 93 70 31 0 30.4 0.315 23 0 [768 rows x 9 columns] All but last row, X: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1 5 5 116 74 0 0 25.6 0.201 30 0 .. ... ... ... ... ... ... ... ... ... 762 9 89 62 0 0 22.5 0.142 33 0 763 10 101 76 48 180 32.9 0.171 63 0 764 2 122 70 27 0 36.8 0.340 27 0 765 5 121 72 23 112 26.2 0.245 30 0 766 1 126 60 0 0 30.1 0.349 47 1 [766 rows x 9 columns] Last column:Y 0 1 1 0 2 1 3 0 4 1 .. 763 0 764 0 765 0 766 1 767 0 Name: Outcome, Length: 768, dtype: int64
New error:
Error:
Traceback (most recent call last): File "/media/larz/Projects/projects/QRST/T/TryStuffNew/src/Jul_15_2022_1.py", line 21, in <module> X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.3) File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/model_selection/_split.py", line 2430, in train_test_split arrays = indexable(*arrays) File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 433, in indexable check_consistent_length(*result) File "/media/larz/Projects/projects/QRST/T/TryStuffNew/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 387, in check_consistent_length raise ValueError( ValueError: Found input variables with inconsistent numbers of samples: [766, 768]