Python Forum
Thread Rating:
  • 2 Vote(s) - 1.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Naive Bayes too slow
#11
Quote:is there no way to "rebuild" scikit-learn in the proper manner?

Sure, if you're one of the authors of 'scikit-learn'.

Just as you can't fix any of the bugs in MS windows, you cannot fix bugs in scikit's program.

What you can do is work around the bug.

That's what the link is showing you how to do.
Reply
#12
(Oct-22-2016, 09:06 PM)Larz60+ Wrote:
Quote: is there no way to "rebuild" scikit-learn in the proper manner?
Sure, if you're one of the authors of 'scikit-learn'. Just as you can't fix any of the bugs in MS windows, you cannot fix bugs in scikit's program. What you can do is work around the bug. That's what the link is showing you how to do.


hahaha ok now I understand... touché. thank you, sir.
Reply
#13
Actually I should clarify.
If they supply the source code you could technically modify and rebuild their package.
But this is a big No-no that will bite you in the dupa at some point.
Next update, your fix would be gone!
Reply
#14
(Oct-22-2016, 09:15 PM)Larz60+ Wrote: Actually I should clarify. If they supply the source code you could technically modify and rebuild their package. But this is a big No-no that will bite you in the dupa at some point. Next update, your fix would be gone!

ughhhhhhhhh i'm forever doomed i guess....

im going to try to re-read your link and see if i can make sense of it to use scipy instead as that's how i read it
Reply
#15
(Oct-22-2016, 09:29 PM)pythlang Wrote:
(Oct-22-2016, 09:15 PM)Larz60+ Wrote: Actually I should clarify. If they supply the source code you could technically modify and rebuild their package. But this is a big No-no that will bite you in the dupa at some point. Next update, your fix would be gone!
ughhhhhhhhh i'm forever doomed i guess.... im going to try to re-read your link and see if i can make sense of it to use scipy instead as that's how i read it


soooooo being the little A-hole (please don't infract me) that I am, I finally got scikit-learn and numpy to function properly.

after several meetings with several of my friends that have BS in CS and none of them having any idea of how to fix this, I literally have come to the conclusion that I'm just dumb lucky.

however, i am still receiving an error. anyone know why?

One of them suggested to install home-brew. i figured, why not, i don't really understand this stuff anyway so what do i have to lose. I installed home-brew and then proceeded to mega download a lot of module packages from a website i cannot currently find on one of the 57 tabs i have open.

i ran so many codes after the last few days of trying everything and anything to get it figured out but for the sake of conductivity i shan't post them all; instead i'll tell you what i did in the last few minutes that got it working.

i ran the following protocols in this sequence:

pip uninstall numpy #

pip uninstall scikit-learn #

brew uninstall numpy

brew uninstall scikit-learn 
then, i went into my folders and manually deleted every instance of scikit-learn dirt and sklearn i could find. then i ran

 brew install numpy

brew install scikit-learn
and low and behold, my code ran with the following inputs/outputs:

import nltk
import random
from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB


documents = [(list(movie_reviews.words(fileid)), category)
            for category in movie_reviews.categories()
            for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = []
for w in movie_reviews.words():
   all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
   words = set(document)
   features = {}
   for w in word_features:
       features[w] = (w in words)

   return features

# print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1900]
testing_set = featuresets[:1900:]

# classifier = nltk.NaiveBayesClassifier.train(training_set)

classifier_f = open("naivebayes.pickle", "rb")
classifier = pickle.load(classifier_f)
classifier_f.close()

print("Original Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
classifier.show_most_informative_features(15)

# save_classifier = open("naivebayes.pickle", "wb")
# pickle.dump(classifier, save_classifier)
# save_classifier.close()

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print("MNB_classifier accuracy percent:", (nltk.classify.accuracy(MNB_classifier, testing_set))*100)

GaussianNB_classifier = SklearnClassifier(GaussianNB())
GaussianNB_classifier.train(training_set)
print("GaussianNB_classifier:", (nltk.classify.accuracy(GaussianNB_classifier, testing_set))*100)

BernoulliNB_classifier = SklearnClassifier(BernoulliNB())
BernoulliNB_classifier.train(training_set)
print("BernoulliNB_classifier:", (nltk.classify.accuracy(BernoulliNB_classifier, testing_set))*100)
Output:
('Original Naive Bayes Algo accuracy percent:', 87.36842105263159) Most Informative Features               insulting = True              neg : pos    =     11.0 : 1.0                    sans = True              neg : pos    =      9.0 : 1.0            refreshingly = True              pos : neg    =      8.4 : 1.0                 wasting = True              neg : pos    =      8.3 : 1.0              mediocrity = True              neg : pos    =      7.7 : 1.0               dismissed = True              pos : neg    =      7.0 : 1.0                 customs = True              pos : neg    =      6.3 : 1.0                  fabric = True              pos : neg    =      6.3 : 1.0             overwhelmed = True              pos : neg    =      6.3 : 1.0             bruckheimer = True              neg : pos    =      6.3 : 1.0                   wires = True              neg : pos    =      6.3 : 1.0               uplifting = True              pos : neg    =      6.2 : 1.0                     ugh = True              neg : pos    =      5.8 : 1.0                  stinks = True              neg : pos    =      5.8 : 1.0                    lang = True              pos : neg    =      5.7 : 1.0 ('MNB_classifier accuracy percent:', 89.6842105263158)
Error:
Traceback (most recent call last):  File "/Users/jordanXXX/Documents/NLP/scikitlearn", line 56, in <module>    GaussianNB_classifier.train(training_set)  File "/Library/Python/2.7/site-packages/nltk/classify/scikitlearn.py", line 117, in train    self._clf.fit(X, y)  File "/usr/local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 182, in fit    X, y = check_X_y(X, y)  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 521, in check_X_y    ensure_min_features, warn_on_dtype, estimator)  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 380, in check_array    force_all_finite)  File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 243, in _ensure_sparse_format    raise TypeError('A sparse matrix was passed, but dense ' TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Reply
#16
I don't know why I didn't check earlier, but there is a huge repository of wheels available at UC Irvine
if you go to the scikit-learn section you will find wheels for all releases of python up to 3.6 (which I didn't even know was out there)

a typical name will look like: scikit_learn-0.18-cp35-cp35m-win_amd64.whl
the cp35 indicates python version 3.5 the amd64 is fir 64 bit windows.

download the version you need, switch to the directory where you downloaded the wheel,
and run pip install wheelname

It does mention that  numpy+mkl must be installed first. (they have the wheel for that also).

I have used this site a lot, it will bring in all required packages.

It has never failed for me
Reply
#17
(Oct-24-2016, 03:15 AM)Larz60+ Wrote: I don't know why I didn't check earlier, but there is a huge repository of wheels available at UC Irvine
if you go to the scikit-learn section you will find wheels for all releases of python up to 3.6 (which I didn't even know was out there)

a typical name will look like: scikit_learn-0.18-cp35-cp35m-win_amd64.whl
the cp35 indicates python version 3.5 the amd64 is fir 64 bit windows.

download the version you need, switch to the directory where you downloaded the wheel,
and run pip install wheelname

It does mention that  numpy+mkl must be installed first. (they have the wheel for that also).

I have used this site a lot, it will bring in all required packages.

It has never failed for me


Nice thank you! I also read numpy+mkl must be installed first

I am using Mac OSX.... is this website compatible?
Reply
#18
No, that site is strictly for windows. The Mac should be similar to linux, though you may have to install a compiler (such as gcc) first.  Not sure, though, as I don't have a Mac
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#19
Last apple I worked on was an II-E
I hear that OS-X is a spin off of unix?? It that true?
Reply
#20
(Oct-24-2016, 01:20 PM)sparkz_alot Wrote: No, that site is strictly for windows. The Mac should be similar to linux, though you may have to install a compiler (such as gcc) first. Not sure, though, as I don't have a Mac

Thank you for the heads up.

I recall when I did the re install I thought my computer froze on something called

"Gcc making bootstrap"

For like 60 mins but I looked it up and people said it just took a while for the 471 MB..

I'm on my way home to do some tinkering so let me see what I can definitively find and try then post if I need direction.

You guys are the best, thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [nltk] Naive Bayes Classifier constantin01 0 2,000 Jun-24-2019, 10:36 AM
Last Post: constantin01

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020