Oct-09-2019, 12:45 PM
I am trying to implement a model for fake news detection. The dataset I am using has been taken from this source :
https://drive.google.com/file/d/1er9NJTL...4a-_q/view
I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?
Output :
https://drive.google.com/file/d/1er9NJTL...4a-_q/view
I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import numpy as np import pandas as pd import itertools from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import accuracy_score, confusion_matrix df = pd.read_csv( 'news.csv' ) print (df.head()) labels = df.label features = df[ 'text' ] x_train,x_test,y_train,y_test = train_test_split(features , labels, test_size = 0.2 , random_state = 7 ) tfidf_vectorizer = TfidfVectorizer(stop_words = "english" ,max_df = 0.7 , analyzer = 'word' ,sublinear_tf = True ) tfidf_train = tfidf_vectorizer.fit_transform(x_train) tfidf_test = tfidf_vectorizer.transform(x_test) tfidf_test.shape from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB().fit(tfidf_train, y_train) y_pred = clf.predict(tfidf_test) score = accuracy_score(y_test,y_pred) print ( f 'Accuracy: {round(score*100,2)}%' ) confusion_matrix(y_test,y_pred, labels = [ 'FAKE' , 'REAL' ]) |
Output:Accuracy: 82.24%
array([[419, 219],
[ 6, 623]], dtype=int64)