Low accuracy for fake news detection model

shivani · Oct-09-2019, 12:45 PM

I am trying to implement a model for fake news detection. The dataset I am using has been taken from this source :
https://drive.google.com/file/d/1er9NJTL...4a-_q/view

I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?

import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix

df=pd.read_csv('news.csv')
print(df.head())

labels=df.label
features = df['text']

x_train,x_test,y_train,y_test=train_test_split(features , labels, test_size=0.2, random_state=7)

tfidf_vectorizer=TfidfVectorizer(stop_words="english",max_df=0.7, analyzer='word',sublinear_tf = True)

tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

tfidf_test.shape

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(tfidf_train, y_train)

y_pred=clf.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

Output :

Output:Accuracy: 82.24%
 
array([[419, 219],
       [  6, 623]], dtype=int64)

animeshagrawal2807 · Oct-10-2019, 12:09 PM

When it comes to accuracy you have to try different methods and tune the parameters to see what works better. In this dataset, the Multinomial Naive Bayes classifier does not perform well. After trying different models, Passive-aggressive classifier gave 93% accuracy.
The implementation of Passive aggressive classifier is given below:

import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.linear_model import PassiveAggressiveClassifier

df=pd.read_csv('news.csv')
#print(df.head())

labels=df.label
features = df['text']

x_train,x_test,y_train,y_test=train_test_split(features , labels, test_size=0.2, random_state=7)

tfidf_vectorizer=TfidfVectorizer(stop_words="english",max_df=0.7, analyzer='word',sublinear_tf = True)

tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

Output:Accuracy: 93.37%
 
array([[594,  44],
       [ 40, 589]], dtype=int64)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	LSTM Model accuracy caps and I can't improve it	celinafregoso99	1	2,086	Dec-19-2020, 01:29 PM Last Post: jefsummers
	Increasing validation accuracy on a CNN	hobbyist	4	4,296	Jun-23-2020, 01:15 PM Last Post: hussainmujtaba
	Loss and Accuracy Figures.	Hani	3	3,189	May-20-2020, 06:55 PM Last Post: jefsummers
	Best Accuracy From Loop.	AhmadMWaddah	4	2,598	Mar-17-2020, 10:25 PM Last Post: stullis
	Why is my train and test accuracy so low?	python420	0	2,158	Dec-08-2019, 08:51 PM Last Post: python420

Low accuracy for fake news detection model

User Panel Messages

Announcements