Python Forum
Index data must be 1-dimensional : Classifier with sklearn - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Index data must be 1-dimensional : Classifier with sklearn (/thread-33138.html)



Index data must be 1-dimensional : Classifier with sklearn - Salma - Apr-01-2021

Hi,
As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification.
I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutorials/random-forests-classifier-python.

However, I have a problem when I use this code with my data.
The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional

My data look like this (for the first lines)
Output:
LS SSA B Fe K Mg Mn S Ti Zn cytotoxicity_class 1,3 283 16 21 47 45 1 44 20 32 low 0,7 439 92 1008 201 304 13 136 34 12 low 0,5 692 97 589 708 182 6 421 108 8 high
Here is the code :
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
data=pd.read_excel(r"C:\Users\ASUS\Desktop\Dataset_cluster_cytotoxicity.xlsx")
X=data[['LS', 'SSA', 'B', 'Fe', 'K', 'Mg', 'Mn', 'S', 'Ti', 'Zn']]  # Features
y=data['cytotoxicity_class']  # Labels
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test
#Import Random Forest Model
from sklearn.ensemble import RandomForestClassifier
#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)
#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
from sklearn.ensemble import RandomForestClassifier
#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)
#Train the model using the training sets y_pred=clf.predict(X_test)
clf.fit(X_train,y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
feature_imp = pd.Series(clf.feature_importances_,index=X).sort_values(ascending=False)
Thank you for your help !
Salma