Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 [nltk] Naive Bayes Classifier
#1
Hello

I use nltk.NaiveBayesClassifier in order to make opinion analysis. I have a problem.

What I do:

1. Take lists of negative and positive words, shuffle it.

2. Use Brown corpus of movie reviews

docs = [ (list(movie_reviews.words(fileid)), category)
     for category in movie_reviews.categories()
     for fileid in movie_reviews.fileids(category)]

3. Function to represent text as vector of features


def vector(doc):
    doc_words = set(doc)
    vect = {}
    for w in words: // words = pos_words + neg_words
        vect[w] = (w in doc_words)
    return vect

4. Take all labelled reviews and represent them as vectors of features ( { vector : lavel } )

5. Train classifier

>>> classifier.show_most_informative_features()
Most Informative Features
              astounding = 1                 pos : neg    =     12.3 : 1.0
             outstanding = 1                 pos : neg    =     11.5 : 1.0
               ludicrous = 1                 neg : pos    =     11.0 : 1.0
             fascination = 1                 pos : neg    =     11.0 : 1.0
               insulting = 1                 neg : pos    =     11.0 : 1.0
                   sucks = 1                 neg : pos    =     10.6 : 1.0
                seamless = 1                 pos : neg    =     10.3 : 1.0
                  hatred = 1                 pos : neg    =     10.3 : 1.0
                   dread = 1                 pos : neg    =      9.7 : 1.0
              accessible = 1                 pos : neg    =      9.7 : 1.0


TEST:

sent1 = { 'good' : 1 } \\ just one word "good"
>>> classifier.classify(sent1)
'neg'
Fail!

What is wrong?
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Sentiment Analysis Classifier lode 0 443 Feb-04-2019, 05:00 AM
Last Post: lode
  Naive Bayes too slow pythlang 22 9,454 Oct-25-2016, 01:57 AM
Last Post: pythlang

Forum Jump:


Users browsing this thread: 1 Guest(s)