(Oct-22-2016, 09:29 PM)pythlang Wrote: (Oct-22-2016, 09:15 PM)Larz60+ Wrote: Actually I should clarify. If they supply the source code you could technically modify and rebuild their package. But this is a big No-no that will bite you in the dupa at some point. Next update, your fix would be gone!
ughhhhhhhhh i'm forever doomed i guess.... im going to try to re-read your link and see if i can make sense of it to use scipy instead as that's how i read it
soooooo being the little A-hole (please don't infract me) that I am, I finally got scikit-learn and numpy to function properly.
after several meetings with several of my friends that have BS in CS and none of them having any idea of how to fix this, I literally have come to the conclusion that I'm just dumb lucky.
however, i am still receiving an error. anyone know why?
One of them suggested to install home-brew. i figured, why not, i don't really understand this stuff anyway so what do i have to lose. I installed home-brew and then proceeded to mega download a lot of module packages from a website i cannot currently find on one of the 57 tabs i have open.
i ran so many codes after the last few days of trying everything and anything to get it figured out but for the sake of conductivity i shan't post them all; instead i'll tell you what i did in the last few minutes that got it working.
i ran the following protocols in this sequence:
pip uninstall numpy #
pip uninstall scikit-learn #
brew uninstall numpy
brew uninstall scikit-learn
then, i went into my folders and manually deleted every instance of scikit-learn dirt and sklearn i could find. then i ran
brew install numpy
brew install scikit-learn
and low and behold, my code ran with the following inputs/outputs:
import nltk
import random
from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
# print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))
featuresets = [(find_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1900]
testing_set = featuresets[:1900:]
# classifier = nltk.NaiveBayesClassifier.train(training_set)
classifier_f = open("naivebayes.pickle", "rb")
classifier = pickle.load(classifier_f)
classifier_f.close()
print("Original Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
classifier.show_most_informative_features(15)
# save_classifier = open("naivebayes.pickle", "wb")
# pickle.dump(classifier, save_classifier)
# save_classifier.close()
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print("MNB_classifier accuracy percent:", (nltk.classify.accuracy(MNB_classifier, testing_set))*100)
GaussianNB_classifier = SklearnClassifier(GaussianNB())
GaussianNB_classifier.train(training_set)
print("GaussianNB_classifier:", (nltk.classify.accuracy(GaussianNB_classifier, testing_set))*100)
BernoulliNB_classifier = SklearnClassifier(BernoulliNB())
BernoulliNB_classifier.train(training_set)
print("BernoulliNB_classifier:", (nltk.classify.accuracy(BernoulliNB_classifier, testing_set))*100)
Output:
('Original Naive Bayes Algo accuracy percent:', 87.36842105263159)
Most Informative Features
insulting = True neg : pos = 11.0 : 1.0
sans = True neg : pos = 9.0 : 1.0
refreshingly = True pos : neg = 8.4 : 1.0
wasting = True neg : pos = 8.3 : 1.0
mediocrity = True neg : pos = 7.7 : 1.0
dismissed = True pos : neg = 7.0 : 1.0
customs = True pos : neg = 6.3 : 1.0
fabric = True pos : neg = 6.3 : 1.0
overwhelmed = True pos : neg = 6.3 : 1.0
bruckheimer = True neg : pos = 6.3 : 1.0
wires = True neg : pos = 6.3 : 1.0
uplifting = True pos : neg = 6.2 : 1.0
ugh = True neg : pos = 5.8 : 1.0
stinks = True neg : pos = 5.8 : 1.0
lang = True pos : neg = 5.7 : 1.0
('MNB_classifier accuracy percent:', 89.6842105263158)
Error:
Traceback (most recent call last):
File "/Users/jordanXXX/Documents/NLP/scikitlearn", line 56, in <module>
GaussianNB_classifier.train(training_set)
File "/Library/Python/2.7/site-packages/nltk/classify/scikitlearn.py", line 117, in train
self._clf.fit(X, y)
File "/usr/local/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 182, in fit
X, y = check_X_y(X, y)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 521, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 380, in check_array
force_all_finite)
File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 243, in _ensure_sparse_format
raise TypeError('A sparse matrix was passed, but dense '
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.