Jul-10-2017, 11:54 AM
I have a data set of 19 features (v1---v19) and one class label (c1) , I can eaily get the precision recall value of all variables with the class label, but I want the AUCPR of individual features with the class label The data is in this form
Quote:V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 C1
4182 4182 4182 1 2 0 0 0 4 1 1 0 5 0 1 1 24 4.4654 28.18955043 1
11396 3798.6 3825 3 1 0 1 0 0 3 3 1 0 1 1 3 5 4.452 11.90765492 0
60416 5034.66 5393.5 12 1 0 0 0 0 12 12 3 6 1 4 12 2 4.4711 35.11543135 0
34580 4940 5254 7 1 4 0 2 0 10 12 8 0 1 1 10 45 4.4689 32.44228433 1
8667 4333.5 4333.5 2 1 0 1 0 0 2 2 1 0 1 0 2 1 4.4659 28.79708384 0
4011 4011 4011 1 1 30 0 0 0 2 2 1 8 1 0 2 1 4.4634 25.75941677 0
691347 5083.43 5300 136 2 0 0 0 9 44 44 12 0 1 12 44 32 4.4693 32.92831106 1
from collections import defaultdict from sklearn.cross_validation import train_test_split from sklearn.ensemble import RandomForestClassifier import pandas as pd import numpy as np from sklearn.metrics import average_precision_score mydata = pd.read_csv("TEST_2.csv") y = mydata["C1"] #provided your csv has header row, and the label column is named "Label" ##select all but the last column as data X = mydata.ix[:,:-1] X=X.iloc[:,:] names = X.iloc[:,:].columns.tolist() # -- Gridsearched parameters model_rf = RandomForestClassifier(n_estimators=500, class_weight="auto", criterion='gini', bootstrap=True, max_features=10, min_samples_split=1, min_samples_leaf=6, max_depth=3, n_jobs=-1) scores = defaultdict(list) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0) # -- Fit the model (could be cross-validated) for i in range(X_train.shape[1]): X_t = X_test.copy() rf = model_rf.fit(X_train[:,i], y_train) scores[names[i]] = average_precision_score(y_test, rf.predict(X_t[:,i)) print("Features sorted by their score:") print(sorted([(round(np.mean(score), 4), feat) for feat, score in scores.items()], reverse=True))It is giving me unhashable type error for X_train[:,i]and X_t[:,i]