[nltk] Naive Bayes Classifier - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: [nltk] Naive Bayes Classifier (/thread-19341.html) |
[nltk] Naive Bayes Classifier - constantin01 - Jun-24-2019 Hello I use nltk.NaiveBayesClassifier in order to make opinion analysis. I have a problem. What I do: 1. Take lists of negative and positive words, shuffle it. 2. Use Brown corpus of movie reviews docs = [ (list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)]3. Function to represent text as vector of features def vector(doc): doc_words = set(doc) vect = {} for w in words: // words = pos_words + neg_words vect[w] = (w in doc_words) return vect4. Take all labelled reviews and represent them as vectors of features ( { vector : lavel } ) 5. Train classifier >>> classifier.show_most_informative_features() Most Informative Features astounding = 1 pos : neg = 12.3 : 1.0 outstanding = 1 pos : neg = 11.5 : 1.0 ludicrous = 1 neg : pos = 11.0 : 1.0 fascination = 1 pos : neg = 11.0 : 1.0 insulting = 1 neg : pos = 11.0 : 1.0 sucks = 1 neg : pos = 10.6 : 1.0 seamless = 1 pos : neg = 10.3 : 1.0 hatred = 1 pos : neg = 10.3 : 1.0 dread = 1 pos : neg = 9.7 : 1.0 accessible = 1 pos : neg = 9.7 : 1.0TEST: sent1 = { 'good' : 1 } \\ just one word "good" >>> classifier.classify(sent1) 'neg'Fail! What is wrong? |