Oct-09-2019, 12:45 PM
I am trying to implement a model for fake news detection. The dataset I am using has been taken from this source :
https://drive.google.com/file/d/1er9NJTL...4a-_q/view
I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?
https://drive.google.com/file/d/1er9NJTL...4a-_q/view
I am getting around 82% accuracy which is low compared to the other people models. Is there a better way to improve the accuracy of my model?
import numpy as np import pandas as pd import itertools from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import accuracy_score, confusion_matrix df=pd.read_csv('news.csv') print(df.head()) labels=df.label features = df['text'] x_train,x_test,y_train,y_test=train_test_split(features , labels, test_size=0.2, random_state=7) tfidf_vectorizer=TfidfVectorizer(stop_words="english",max_df=0.7, analyzer='word',sublinear_tf = True) tfidf_train=tfidf_vectorizer.fit_transform(x_train) tfidf_test=tfidf_vectorizer.transform(x_test) tfidf_test.shape from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB().fit(tfidf_train, y_train) y_pred=clf.predict(tfidf_test) score=accuracy_score(y_test,y_pred) print(f'Accuracy: {round(score*100,2)}%') confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])Output :
Output:Accuracy: 82.24%
array([[419, 219],
[ 6, 623]], dtype=int64)