[nltk] Naive Bayes Classifier - Printable Version

[nltk] Naive Bayes Classifier - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: [nltk] Naive Bayes Classifier (/thread-19341.html)

[nltk] Naive Bayes Classifier - constantin01 - Jun-24-2019

Hello

I use nltk.NaiveBayesClassifier in order to make opinion analysis. I have a problem.

What I do:

1. Take lists of negative and positive words, shuffle it.

2. Use Brown corpus of movie reviews

docs = [ (list(movie_reviews.words(fileid)), category)
     for category in movie_reviews.categories()
     for fileid in movie_reviews.fileids(category)]

3. Function to represent text as vector of features

def vector(doc):
    doc_words = set(doc)
    vect = {}
    for w in words: // words = pos_words + neg_words
        vect[w] = (w in doc_words)
    return vect

4. Take all labelled reviews and represent them as vectors of features ( { vector : lavel } )

5. Train classifier

>>> classifier.show_most_informative_features()
Most Informative Features
              astounding = 1                 pos : neg    =     12.3 : 1.0
             outstanding = 1                 pos : neg    =     11.5 : 1.0
               ludicrous = 1                 neg : pos    =     11.0 : 1.0
             fascination = 1                 pos : neg    =     11.0 : 1.0
               insulting = 1                 neg : pos    =     11.0 : 1.0
                   sucks = 1                 neg : pos    =     10.6 : 1.0
                seamless = 1                 pos : neg    =     10.3 : 1.0
                  hatred = 1                 pos : neg    =     10.3 : 1.0
                   dread = 1                 pos : neg    =      9.7 : 1.0
              accessible = 1                 pos : neg    =      9.7 : 1.0

TEST:

sent1 = { 'good' : 1 } \\ just one word "good"
>>> classifier.classify(sent1)
'neg'

Fail!

What is wrong?