Naive Bayes too slow

**Larz60+** · Oct-22-2016, 09:06 PM

Quote:is there no way to "rebuild" scikit-learn in the proper manner?

Sure, if you're one of the authors of 'scikit-learn'.

Just as you can't fix any of the bugs in MS windows, you cannot fix bugs in scikit's program.

What you can do is work around the bug.

That's what the link is showing you how to do.

pythlang · Oct-22-2016, 09:09 PM

(Oct-22-2016, 09:06 PM)Larz60+ Wrote:
Quote: is there no way to "rebuild" scikit-learn in the proper manner?
Sure, if you're one of the authors of 'scikit-learn'. Just as you can't fix any of the bugs in MS windows, you cannot fix bugs in scikit's program. What you can do is work around the bug. That's what the link is showing you how to do.

hahaha ok now I understand... touché. thank you, sir.

**Larz60+** · Oct-22-2016, 09:15 PM

Actually I should clarify.
If they supply the source code you could technically modify and rebuild their package.
But this is a big No-no that will bite you in the dupa at some point.
Next update, your fix would be gone!

pythlang · Oct-22-2016, 09:29 PM

(Oct-22-2016, 09:15 PM)Larz60+ Wrote: Actually I should clarify. If they supply the source code you could technically modify and rebuild their package. But this is a big No-no that will bite you in the dupa at some point. Next update, your fix would be gone!

ughhhhhhhhh i'm forever doomed i guess....

im going to try to re-read your link and see if i can make sense of it to use scipy instead as that's how i read it

pythlang · (This post was last modified: Oct-24-2016, 01:01 AM by pythlang.)

(Oct-22-2016, 09:29 PM)pythlang Wrote:
(Oct-22-2016, 09:15 PM)Larz60+ Wrote: Actually I should clarify. If they supply the source code you could technically modify and rebuild their package. But this is a big No-no that will bite you in the dupa at some point. Next update, your fix would be gone!
ughhhhhhhhh i'm forever doomed i guess.... im going to try to re-read your link and see if i can make sense of it to use scipy instead as that's how i read it

soooooo being the little A-hole (please don't infract me) that I am, I finally got scikit-learn and numpy to function properly.

after several meetings with several of my friends that have BS in CS and none of them having any idea of how to fix this, I literally have come to the conclusion that I'm just dumb lucky.

however, i am still receiving an error. anyone know why?

One of them suggested to install home-brew. i figured, why not, i don't really understand this stuff anyway so what do i have to lose. I installed home-brew and then proceeded to mega download a lot of module packages from a website i cannot currently find on one of the 57 tabs i have open.

i ran so many codes after the last few days of trying everything and anything to get it figured out but for the sake of conductivity i shan't post them all; instead i'll tell you what i did in the last few minutes that got it working.

i ran the following protocols in this sequence:

pip uninstall numpy #

pip uninstall scikit-learn #

brew uninstall numpy

brew uninstall scikit-learn

then, i went into my folders and manually deleted every instance of scikit-learn dirt and sklearn i could find. then i ran

 brew install numpy

brew install scikit-learn

and low and behold, my code ran with the following inputs/outputs:

import nltk
import random
from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB


documents = [(list(movie_reviews.words(fileid)), category)
            for category in movie_reviews.categories()
            for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = []
for w in movie_reviews.words():
   all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
   words = set(document)
   features = {}
   for w in word_features:
       features[w] = (w in words)

   return features

# print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1900]
testing_set = featuresets[:1900:]

# classifier = nltk.NaiveBayesClassifier.train(training_set)

classifier_f = open("naivebayes.pickle", "rb")
classifier = pickle.load(classifier_f)
classifier_f.close()

print("Original Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
classifier.show_most_informative_features(15)

# save_classifier = open("naivebayes.pickle", "wb")
# pickle.dump(classifier, save_classifier)
# save_classifier.close()

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print("MNB_classifier accuracy percent:", (nltk.classify.accuracy(MNB_classifier, testing_set))*100)

GaussianNB_classifier = SklearnClassifier(GaussianNB())
GaussianNB_classifier.train(training_set)
print("GaussianNB_classifier:", (nltk.classify.accuracy(GaussianNB_classifier, testing_set))*100)

BernoulliNB_classifier = SklearnClassifier(BernoulliNB())
BernoulliNB_classifier.train(training_set)
print("BernoulliNB_classifier:", (nltk.classify.accuracy(BernoulliNB_classifier, testing_set))*100)

Output:('Original Naive Bayes Algo accuracy percent:', 87.36842105263159)
Most Informative Features
              insulting = True              neg : pos    =     11.0 : 1.0
                   sans = True              neg : pos    =      9.0 : 1.0
           refreshingly = True              pos : neg    =      8.4 : 1.0
                wasting = True              neg : pos    =      8.3 : 1.0
             mediocrity = True              neg : pos    =      7.7 : 1.0
              dismissed = True              pos : neg    =      7.0 : 1.0
                customs = True              pos : neg    =      6.3 : 1.0
                 fabric = True              pos : neg    =      6.3 : 1.0
            overwhelmed = True              pos : neg    =      6.3 : 1.0
            bruckheimer = True              neg : pos    =      6.3 : 1.0
                  wires = True              neg : pos    =      6.3 : 1.0
              uplifting = True              pos : neg    =      6.2 : 1.0
                    ugh = True              neg : pos    =      5.8 : 1.0
                 stinks = True              neg : pos    =      5.8 : 1.0
                   lang = True              pos : neg    =      5.7 : 1.0
('MNB_classifier accuracy percent:', 89.6842105263158)

Error: Traceback (most recent call last):
 File "/Users/jordanXXX/Documents/NLP/scikitlearn", line 56, in <module>
   GaussianNB_classifier.train(training_set)
 File "/Library/Python/2.7/site-packages/nltk/classify/scikitlearn.py", line 117, in train
   self._clf.fit(X, y)
 File "/usr/local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 182, in fit
   X, y = check_X_y(X, y)
 File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 521, in check_X_y
   ensure_min_features, warn_on_dtype, estimator)
 File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 380, in check_array
   force_all_finite)
 File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 243, in _ensure_sparse_format
   raise TypeError('A sparse matrix was passed, but dense '
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

**Larz60+** · Oct-24-2016, 03:15 AM

I don't know why I didn't check earlier, but there is a huge repository of wheels available at UC Irvine
if you go to the scikit-learn section you will find wheels for all releases of python up to 3.6 (which I didn't even know was out there)

a typical name will look like: scikit_learn-0.18-cp35-cp35m-win_amd64.whl
the cp35 indicates python version 3.5 the amd64 is fir 64 bit windows.

download the version you need, switch to the directory where you downloaded the wheel,
and run pip install wheelname

It does mention that numpy+mkl must be installed first. (they have the wheel for that also).

I have used this site a lot, it will bring in all required packages.

It has never failed for me

pythlang · Oct-24-2016, 12:44 PM

(Oct-24-2016, 03:15 AM)Larz60+ Wrote: I don't know why I didn't check earlier, but there is a huge repository of wheels available at UC Irvine
if you go to the scikit-learn section you will find wheels for all releases of python up to 3.6 (which I didn't even know was out there)

a typical name will look like: scikit_learn-0.18-cp35-cp35m-win_amd64.whl
the cp35 indicates python version 3.5 the amd64 is fir 64 bit windows.

download the version you need, switch to the directory where you downloaded the wheel,
and run pip install wheelname

It does mention that numpy+mkl must be installed first. (they have the wheel for that also).

I have used this site a lot, it will bring in all required packages.

It has never failed for me

Nice thank you! I also read numpy+mkl must be installed first

I am using Mac OSX.... is this website compatible?

***sparkz_alot*** · Oct-24-2016, 01:20 PM

No, that site is strictly for windows. The Mac should be similar to linux, though you may have to install a compiler (such as gcc) first. Not sure, though, as I don't have a Mac

**Larz60+** · Oct-24-2016, 02:58 PM

Last apple I worked on was an II-E
I hear that OS-X is a spin off of unix?? It that true?

pythlang · Oct-24-2016, 03:06 PM

(Oct-24-2016, 01:20 PM)sparkz_alot Wrote: No, that site is strictly for windows. The Mac should be similar to linux, though you may have to install a compiler (such as gcc) first. Not sure, though, as I don't have a Mac

Thank you for the heads up.

I recall when I did the re install I thought my computer froze on something called

"Gcc making bootstrap"

For like 60 mins but I looked it up and people said it just took a while for the 471 MB..

I'm on my way home to do some tinkering so let me see what I can definitively find and try then post if I need direction.

You guys are the best, thanks

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[nltk] Naive Bayes Classifier	constantin01	0	2,081	Jun-24-2019, 10:36 AM Last Post: constantin01

Naive Bayes too slow

User Panel Messages

Announcements