Apr-01-2021, 03:22 PM
Hi,
As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification.
I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutor...ier-python.
However, I have a problem when I use this code with my data.
The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional
My data look like this (for the first lines)
Salma
As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification.
I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutor...ier-python.
However, I have a problem when I use this code with my data.
The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional
My data look like this (for the first lines)
Output:LS SSA B Fe K Mg Mn S Ti Zn cytotoxicity_class
1,3 283 16 21 47 45 1 44 20 32 low
0,7 439 92 1008 201 304 13 136 34 12 low
0,5 692 97 589 708 182 6 421 108 8 high
Here is the code : import numpy as np import pandas as pd from sklearn.model_selection import train_test_split data=pd.read_excel(r"C:\Users\ASUS\Desktop\Dataset_cluster_cytotoxicity.xlsx") X=data[['LS', 'SSA', 'B', 'Fe', 'K', 'Mg', 'Mn', 'S', 'Ti', 'Zn']] # Features y=data['cytotoxicity_class'] # Labels # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test #Import Random Forest Model from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) y_pred=clf.predict(X_test) #Import scikit-learn metrics module for accuracy calculation from sklearn import metrics # Model Accuracy, how often is the classifier correct? print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False) feature_imp = pd.Series(clf.feature_importances_,index=X).sort_values(ascending=False)Thank you for your help !
Salma