Index data must be 1-dimensional : Classifier with sklearn - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Index data must be 1-dimensional : Classifier with sklearn (/thread-33138.html) |
Index data must be 1-dimensional : Classifier with sklearn - Salma - Apr-01-2021 Hi, As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification. I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutorials/random-forests-classifier-python. However, I have a problem when I use this code with my data. The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional My data look like this (for the first lines) Here is the code : import numpy as np import pandas as pd from sklearn.model_selection import train_test_split data=pd.read_excel(r"C:\Users\ASUS\Desktop\Dataset_cluster_cytotoxicity.xlsx") X=data[['LS', 'SSA', 'B', 'Fe', 'K', 'Mg', 'Mn', 'S', 'Ti', 'Zn']] # Features y=data['cytotoxicity_class'] # Labels # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test #Import Random Forest Model from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) y_pred=clf.predict(X_test) #Import scikit-learn metrics module for accuracy calculation from sklearn import metrics # Model Accuracy, how often is the classifier correct? print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False) feature_imp = pd.Series(clf.feature_importances_,index=X).sort_values(ascending=False)Thank you for your help ! Salma |