Python Forum

Full Version: Low accuracy for fake news detection model
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to implement a model for fake news detection. The dataset I am using has been taken from this source :
https://drive.google.com/file/d/1er9NJTL...4a-_q/view

I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?


import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix

df=pd.read_csv('news.csv')
print(df.head())

labels=df.label
features = df['text']

x_train,x_test,y_train,y_test=train_test_split(features , labels, test_size=0.2, random_state=7)

tfidf_vectorizer=TfidfVectorizer(stop_words="english",max_df=0.7, analyzer='word',sublinear_tf = True)

tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

tfidf_test.shape

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(tfidf_train, y_train)

y_pred=clf.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
Output :
Output:
Accuracy: 82.24% array([[419, 219], [ 6, 623]], dtype=int64)
When it comes to accuracy you have to try different methods and tune the parameters to see what works better. In this dataset, the Multinomial Naive Bayes classifier does not perform well. After trying different models, Passive-aggressive classifier gave 93% accuracy.
The implementation of Passive aggressive classifier is given below:

import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.linear_model import PassiveAggressiveClassifier

df=pd.read_csv('news.csv')
#print(df.head())

labels=df.label
features = df['text']

x_train,x_test,y_train,y_test=train_test_split(features , labels, test_size=0.2, random_state=7)

tfidf_vectorizer=TfidfVectorizer(stop_words="english",max_df=0.7, analyzer='word',sublinear_tf = True)

tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
Output:
Accuracy: 93.37% array([[594, 44], [ 40, 589]], dtype=int64)