Python Forum

I have a data set of 19 features (v1---v19) and one class label (c1) , I can eaily get the precision recall value of all variables with the class label, but I want the AUCPR of individual features with the class label The data is in this form

Quote:V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 C1
4182 4182 4182 1 2 0 0 0 4 1 1 0 5 0 1 1 24 4.4654 28.18955043 1
11396 3798.6 3825 3 1 0 1 0 0 3 3 1 0 1 1 3 5 4.452 11.90765492 0
60416 5034.66 5393.5 12 1 0 0 0 0 12 12 3 6 1 4 12 2 4.4711 35.11543135 0
34580 4940 5254 7 1 4 0 2 0 10 12 8 0 1 1 10 45 4.4689 32.44228433 1
8667 4333.5 4333.5 2 1 0 1 0 0 2 2 1 0 1 0 2 1 4.4659 28.79708384 0
4011 4011 4011 1 1 30 0 0 0 2 2 1 8 1 0 2 1 4.4634 25.75941677 0
691347 5083.43 5300 136 2 0 0 0 9 44 44 12 0 1 12 44 32 4.4693 32.92831106 1

from collections import defaultdict
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score

mydata = pd.read_csv("TEST_2.csv")
y = mydata["C1"]  #provided your csv has header row, and the label column is named "Label"

##select all but the last column as data
X = mydata.ix[:,:-1]
X=X.iloc[:,:]
names = X.iloc[:,:].columns.tolist()
# -- Gridsearched parameters
model_rf = RandomForestClassifier(n_estimators=500,
                                 class_weight="auto",
                                 criterion='gini',
                                 bootstrap=True,
                                 max_features=10,
                                 min_samples_split=1,
                                 min_samples_leaf=6,
                                 max_depth=3,
                                 n_jobs=-1)
scores = defaultdict(list)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)
# -- Fit the model (could be cross-validated)



for i in range(X_train.shape[1]):
    X_t = X_test.copy()
    rf = model_rf.fit(X_train[:,i], y_train)
    scores[names[i]] = average_precision_score(y_test, rf.predict(X_t[:,i))




print("Features sorted by their score:")
print(sorted([(round(np.mean(score), 4), feat) for
              feat, score in scores.items()], reverse=True))

It is giving me unhashable type error for X_train[:,i]and X_t[:,i]

Please post the error code, in it's entirety, between the error code tags.

melissa

sparkz_alot