Python Forum
Naive Bayes probabilities very small
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Naive Bayes probabilities very small
#1
Hi,

I have been trying to code a Naive Bayes from scratch.
I had a number of sentences in my training set
ex: I love running - Joy, Fun
I hate exercising. I prefer to dance - Joy, Hate, Happiness

I created a Naive classifier for each emotion and now I am testing my classifiers. I would like to see how accurate my classifiers are by checking the probability given by the classifier, however the probability given is very small ie: almost close to 0. I would like my test sentence to be tested againsst P(Joy), P(Fun), P(Hate), P(Happiness) and for each classifier if prob is > 0.5 then the sentence contains that emotion.

Any idea what I am doing wrong pls?
def make_emotion_prediction(text, counts, emotion_prob, emotion_count):
    prediction = 1
    text = re.sub(r'[^\w\s]',' ',text) #remove punctuation 
    text = ''.join([c for c in text if not c.isdigit()]) #remove numbers
    text = nltk.word_tokenize(text.lower()) #lowercase        
    stop = set(stopwords.words('english')) #remove stopwords
    text = [t for t in text if t not in stop] 
    text = [t for t in text if len(text)>1] 
    new_text = " ".join([i for i in text])
    text_counts = Counter(re.split("\s+", new_text))
    for word in text_counts:    
       # For every word in the text, we get the number of times that word occured in the reviews for a given class, add 1 to smooth the value, and divide by the total number of words in the class (plus the class_count to also smooth the denominator).
       # Smoothing ensures that we don't multiply the prediction by 0 if the word didn't exist in the training data.
       # We also smooth the denominator counts to keep things even.
        prediction *=  text_counts.get(word) * ((counts.get(word, 0) + 1) / float(sum(counts.values()) + emotion_count))
        
  # Now we multiply by the probability of the class existing in the documents.
    return prediction * emotion_prob
This is just one call for one emotion
anger_prediction = make_emotion_prediction(text, anger_counts, prob_anger, anger_sent_count)
This is how anger_counts, prob_anger and anger_sent_counts were calculated:

df2 = df[(df['Anger'] ==1)]
anger = df2['Sentence'].tolist()
angerWords = get_text(anger)
anger_counts = count_text(angerWords)

anger_sent_count = len(df2)

prob_anger = anger_sent_count / float(len(df))
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020