Python Forum

Full Version: cross validate
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Could somebody please help with the implementation of this code.

We are having the following with the current error where the code is bold and underlined:  "TypeError: slice indices must be integers or None or have an __index__ method" 

Additionally, we are having a problem with the evaluation of the predicted outcomes with the actual labels

def crossValidate(dataset, folds):
    shuffle(dataset)
    results = []
    foldSize = len(dataset)/folds
#for i in range(0,len(dataset),foldSize):
    
    for i in range(folds):
        pass
        Train_This_Data = dataset[:i*foldSize] + dataset[(i+1) * foldSize:]

        Test_This_Data = dataset[i*foldSize:(i+1) * foldSize]
        
        x_train= trainClassifier(Train_This_Data)
        y_pred = predictLabels(Test_This_Data, x_train)
        
        print(y_pred[0:10])
        
        for row in Test_This_Data, :
            y_true = row[0]
            print(y_true)
            
    #sklearn.metrics.precision_recall_fscore_support(y_true, y_pred, beta=1.0, labels=None, pos_label=1, average=None, warn_for=('precision', 'recall', 'f-score'), sample_weight=None)
    return results
For mulit-line code, use python tags, without formatting tags. To others, the formatting I took out, which indicated the error, is from the Train_This_Data = line.

Are you working in Python 3.x? len(dataset)/folds may be returning a float and causing the error you're seeing. I would check the foldSize value before the slicing.

As for your predicted outcome problems, that sounds more like a model problem than a Python problem.
My observations:

There is HTML code mixed with the Python code.

Line 2 shuffle(dataset) is an undefined global variable.
I fixed the problem, but now I am having a problem with train_set = dataset[:i*foldSize] + dataset[(i+1)*foldSize] 
The error outputted is TypeError: can only concatenate list (not "tuple") to list
When posting an error, please post the entire Traceback within the "Error" tags (the little red "X" on the menu bar).  Some times the error code(s) can be misleading and therefore it is helpful to see them in their entirety.
A new error occurs. I read through my cross validate function it seems perfectly fine and should cross validate the dataset then classifier those datasets. 
Traceback (most recent call last):
  File "C:\Users\users\Desktop\python\template.py", line 147, in <module>
    cv_results = crossValidate(trainData,10)
  File "C:\Users\users\Desktop\python\template.py", line 93, in crossValidate
    d_train = trainClassifier(train_set)
  File "C:\Users\users\Desktop\python\template.py", line 76, in trainClassifier
    return SklearnClassifier(LinearSVC()).train(trainData)
  File "C:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 117, in train
    self._clf.fit(X, y)
  File "C:\Python27\lib\site-packages\sklearn\svm\classes.py", line 213, in fit
    self.loss)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 885, in _fit_liblinear
    " class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0

This is the new function for cross validate
def crossValidate(dataset,folds):
        shuffle(dataset)
        results = []
        foldSize = len(dataset)//folds
        for i in range(folds):
                # split data into train_set and test_set
                train_set = dataset[:i*foldSize] + dataset[(i+1)*foldSize:]
                test_set = dataset[i*foldSize:(i+1)*foldSize]
                # train classifier and predicted labels
                d_train = trainClassifier(train_set)
                y_pred = predictLabels(test_set, d_train)
                y_true = []
                for row in test_set:
                        y_true.append(row[1])
                        print(y_pred[0:10])
                        cv_results = sklearn.metrics.precision_recall_fscore_support(y_true,y_pred)
        return results.append(cv_results)
Its done after 9 hours and thank you.