Hi,
As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification.
I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutor...ier-python.
However, I have a problem when I use this code with my data.
The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional
My data look like this (for the first lines)
Salma
As a very very beginner, I'm trying to write a code for a random tree forest classifier, and I'm starting with a step to know which features are important for classification.
I found this helpful guide that I'm currently using. https://www.datacamp.com/community/tutor...ier-python.
However, I have a problem when I use this code with my data.
The code works fine (I have my classifier accuracy that is printed) until the last line where I have an error message that says : ValueError: Index data must be 1-dimensional
My data look like this (for the first lines)
Output:LS SSA B Fe K Mg Mn S Ti Zn cytotoxicity_class
1,3 283 16 21 47 45 1 44 20 32 low
0,7 439 92 1008 201 304 13 136 34 12 low
0,5 692 97 589 708 182 6 421 108 8 high
Here is the code : import numpy as np import pandas as pd from sklearn.model_selection import train_test_split data=pd.read_excel(r"C:\Users\ASUS\Desktop\Dataset_cluster_cytotoxicity.xlsx") X=data[['LS', 'SSA', 'B', 'Fe', 'K', 'Mg', 'Mn', 'S', 'Ti', 'Zn']] # Features y=data['cytotoxicity_class'] # Labels # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 70% training and 30% test #Import Random Forest Model from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) y_pred=clf.predict(X_test) #Import scikit-learn metrics module for accuracy calculation from sklearn import metrics # Model Accuracy, how often is the classifier correct? print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) from sklearn.ensemble import RandomForestClassifier #Create a Gaussian Classifier clf=RandomForestClassifier(n_estimators=100) #Train the model using the training sets y_pred=clf.predict(X_test) clf.fit(X_train,y_train) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm_start=False) feature_imp = pd.Series(clf.feature_importances_,index=X).sort_values(ascending=False)Thank you for your help !
Salma
Larz60+ write Apr-02-2021, 01:17 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use bbcode tags on future posts.
Note: output tags are good for formatting input as well.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use bbcode tags on future posts.
Note: output tags are good for formatting input as well.