Jun-24-2019, 10:36 AM
Hello
I use nltk.NaiveBayesClassifier in order to make opinion analysis. I have a problem.
What I do:
1. Take lists of negative and positive words, shuffle it.
2. Use Brown corpus of movie reviews
5. Train classifier
What is wrong?
I use nltk.NaiveBayesClassifier in order to make opinion analysis. I have a problem.
What I do:
1. Take lists of negative and positive words, shuffle it.
2. Use Brown corpus of movie reviews
docs = [ (list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)]3. Function to represent text as vector of features
def vector(doc): doc_words = set(doc) vect = {} for w in words: // words = pos_words + neg_words vect[w] = (w in doc_words) return vect4. Take all labelled reviews and represent them as vectors of features ( { vector : lavel } )
5. Train classifier
>>> classifier.show_most_informative_features() Most Informative Features astounding = 1 pos : neg = 12.3 : 1.0 outstanding = 1 pos : neg = 11.5 : 1.0 ludicrous = 1 neg : pos = 11.0 : 1.0 fascination = 1 pos : neg = 11.0 : 1.0 insulting = 1 neg : pos = 11.0 : 1.0 sucks = 1 neg : pos = 10.6 : 1.0 seamless = 1 pos : neg = 10.3 : 1.0 hatred = 1 pos : neg = 10.3 : 1.0 dread = 1 pos : neg = 9.7 : 1.0 accessible = 1 pos : neg = 9.7 : 1.0TEST:
sent1 = { 'good' : 1 } \\ just one word "good" >>> classifier.classify(sent1) 'neg'Fail!
What is wrong?