Nov-23-2019, 11:58 PM
I used VADER for sentiment analysis. However the examples I used to customise my code reference the class when printing the accuracy_score. VADER is an unsupervised learning method. Why is 'class' referenced? Some examples have positive and negative reviews in separate files. Should that always be the case. I am new in Python programming.
Code and outputs are below;
0 pos Situated in a vibrant gated community the Melr... {'neg': 0.0, 'neu': 0.675, 'pos': 0.325, 'comp... 0.9260 pos
1 pos This hotel was very nice and in a great locati... {'neg': 0.0, 'neu': 0.548, 'pos': 0.452, 'comp... 0.9216 pos
2 neg The hotel has no gym,noisy the sound woke me u... {'neg': 0.145, 'neu': 0.855, 'pos': 0.0, 'comp... -0.2960 neg
3 pos We spent two nights at this Autograph Collecti... {'neg': 0.0, 'neu': 0.635, 'pos': 0.365, 'comp... 0.8475 pos
4 neg I could not believe what was meant to be a spo... {'neg': 0.104, 'neu': 0.719, 'pos': 0.177, 'co... 0.3876 pos
neg 1.00 0.50 0.67 2
pos 0.75 1.00 0.86 3
accuracy 0.80 5
macro avg 0.88 0.75 0.76 5
weighted avg 0.85 0.80 0.78 5
[[1 1]
[0 3]]
Code and outputs are below;
# Importing Libraries import numpy as np import pandas as pd import nltk #nltk.download('vader_lexicon') from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer() # creating object of SentimentIntensityAnalyzer # Import dataset df = pd.read_csv("../vader2.tsv", delimiter = '\t') #df = pd.read_csv('../Restaurant_Reviews.tsv',sep='\\t') df.head() df.dropna(inplace=True) # collecting blank reviews from data frame blanks = [], for i,lb,rv in df.itertuples(): if type(rv) == str: if rv.isspace(): blanks.append(i) #blanks sid.polarity_scores(df.iloc[0]['review']) df['scores'] = df['review'].apply(lambda review:sid.polarity_scores(review)) df.head() df['compound'] = df['scores'].apply(lambda d:d['compound']) df.head() df['score'] = df['compound'].apply(lambda score: 'pos' if score >=0 else 'neg') #df['score'] = df['compound'].apply(lambda score: 1 if score >=0 else 0) df.head()class review scores compound score
0 pos Situated in a vibrant gated community the Melr... {'neg': 0.0, 'neu': 0.675, 'pos': 0.325, 'comp... 0.9260 pos
1 pos This hotel was very nice and in a great locati... {'neg': 0.0, 'neu': 0.548, 'pos': 0.452, 'comp... 0.9216 pos
2 neg The hotel has no gym,noisy the sound woke me u... {'neg': 0.145, 'neu': 0.855, 'pos': 0.0, 'comp... -0.2960 neg
3 pos We spent two nights at this Autograph Collecti... {'neg': 0.0, 'neu': 0.635, 'pos': 0.365, 'comp... 0.8475 pos
4 neg I could not believe what was meant to be a spo... {'neg': 0.104, 'neu': 0.719, 'pos': 0.177, 'co... 0.3876 pos
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix accuracy_score(df['class'],df['score']) print(classification_report(df['class'],df['score'])) print(confusion_matrix(df['class'],df['score']))precision recall f1-score support
neg 1.00 0.50 0.67 2
pos 0.75 1.00 0.86 3
accuracy 0.80 5
macro avg 0.88 0.75 0.76 5
weighted avg 0.85 0.80 0.78 5
[[1 1]
[0 3]]