Python Forum

Full Version: Calculate Rating Score for Reviews Containing Specific Words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I want to calculate the average rating score for reviews that contain any of the words, 'food, buffet, breakfast, supper'. I would like to find out if the food quality affects the hotel rating. The code is;

# Importing Libraries 
import numpy as np   
import pandas as pd  
# Import dataset 
dataset = pd.read_csv("../output_rating.tsv", delimiter = '\t')
datatop = dataset.head()
datatop
Rating Review Liked
0 30 It would be difficult to find a hotel with a b... 0
1 50 Clean rooms. Helpful staff. Close to waterfron... 1
2 50 This hotel deserves all the accolades. We cele... 1
3 50 I stayed at the Protea with my husband and our... 1
4 30 I stayed here while touring around SA and this... 1

# library to clean data 
import re  
import collections


  
# Natural Language Tool Kit 
import nltk  
  
#nltk.download('stopwords') 
  
# to remove stopword 
from nltk.corpus import stopwords 
  
# for Stemming propose  
from nltk.stem.porter import PorterStemmer 
  
# Initialize empty array 
# to append clean text  
corpus = []  
  
# 1000 (reviews) rows to clean 
for i in range(0, 2259):  
      
    # column : "Review", row ith 
    review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])  
      
    # convert all cases to lower cases 
    review = review.lower()  
      
    # split to array(default delimiter is " ") 
    review = review.split()  
      
    # creating PorterStemmer object to 
    # take main stem of each word 
    ps = PorterStemmer()  
      
    # loop for stemming each word 
    # in string array at ith row     
    review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))] 
    
    #MEAN RATING FOR FOOD COMMENTS
    wanted = "buffet breakfast food supper"
    avgRating = 0
    cnt = collections.Counter()
    word = dataset['Review'][i]
    if word in wanted:
        cnt[word]+=1
        print(cnt)
        avgRating = avgRating + dataset['Rating'][i]
    #END RATING SCORE
                  
    # rejoin all string array elements 
    # to create back into a string 
    review = ' '.join(review)   
      
    # append each string to create 
    # array of clean text  
    corpus.append(review)  
The part of the code I expect to calculate the average score is not giving me any output. The code is
#MEAN RATING FOR FOOD COMMENTS
    wanted = "buffet breakfast food supper"
    avgRating = 0
    cnt = collections.Counter()
    word = dataset['Review'][i]
    if word in wanted:
        cnt[word]+=1
        print(cnt)
        avgRating = avgRating + dataset['Rating'][i]
 #END RATING SCORE
(Nov-15-2019, 01:34 PM)bongielondy Wrote: [ -> ]I would like to find out if the food quality affects the hotel rating.
Moved to the data science section, since this is more of a data science/algorithm question than Python itself. If you don't end up getting a reply here, you may want to seek a forum dedicated specifically to the kind of modeling you're doing.