KeyError -read multiple lines - Printable Version

KeyError -read multiple lines - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: KeyError -read multiple lines (/thread-22232.html)

KeyError -read multiple lines - bongielondy - Nov-04-2019

I am new to Python. An example reviews code has single line reviews and runs well. Mine has multiple lines. I converted the csv file to tsv. The reviews file has 2 columns, Review and Liked. Liked contains 0 or 1, for 'not liked' or 'liked'. This is for natural language processing.

# Importing Libraries 
import numpy as np   
import pandas as pd  
  
# Import dataset 
dataset = pd.read_csv("../AfricanPride_b.txt", delimiter = '\t', error_bad_lines = False)

# library to clean data 
import re  
  
# Natural Language Tool Kit 
import nltk  
  
nltk.download('stopwords') 
  
# to remove stopword 
from nltk.corpus import stopwords 
  
# for Stemming propose  
from nltk.stem.porter import PorterStemmer 
  
# Initialize empty array 
# to append clean text  
corpus = []  
  
# 1000 (reviews) rows to clean 
for i in range(0, 5000):  
      
    # column : "Review", row ith 
    review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])  
      
    # convert all cases to lower cases 
    review = review.lower()  
      
    # split to array(default delimiter is " ") 
    review = review.split()  
      
    # creating PorterStemmer object to 
    # take main stem of each word 
    ps = PorterStemmer()  
      
    # loop for stemming each word 
    # in string array at ith row     
    review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))]  
                  
    # rejoin all string array elements 
    # to create back into a string 
    review = ' '.join(review)   
      
    # append each string to create 
    # array of clean text  
    corpus.append(review)

This results in a KeyError.
KeyError Traceback (most recent call last)
<ipython-input-8-0f0b9d7dcfd5> in <module>
21
22 # column : "Review", row ith
---> 23 review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
24
25 # convert all cases to lower cases

The rest of the code is

 # Creating the Bag of Words model 
from sklearn.feature_extraction.text import CountVectorizer 
  
# To extract max 1500 feature. 
# "max_features" is attribute to 
# experiment with to get better results 
cv = CountVectorizer(max_features = 1500)  
  
# X contains corpus (dependent variable) 
X = cv.fit_transform(corpus).toarray()  
  
# y contains answers if review 
# is positive or negative 
y = dataset.iloc[:, 1].values 

# Creating the Bag of Words model 
from sklearn.feature_extraction.text import CountVectorizer 
  
# To extract max 1500 feature. 
# "max_features" is attribute to 
# experiment with to get better results 
cv = CountVectorizer(max_features = 1500)  
  
# X contains corpus (dependent variable) 
X = cv.fit_transform(corpus).toarray()  
  
# y contains answers if review 
# is positive or negative 
y = dataset.iloc[:, 1].values 

# Splitting the dataset into 
# the Training set and Test set 
from sklearn.model_selection import train_test_split 
  
# experiment with "test_size" 
# to get better results 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25) 

# Fitting Random Forest Classification 
# to the Training set 
from sklearn.ensemble import RandomForestClassifier 
  
# n_estimators can be said as number of 
# trees, experiment with n_estimators 
# to get better results  
model = RandomForestClassifier(n_estimators = 501, 
                            criterion = 'entropy') 
                              
model.fit(X_train, y_train)

# Predicting the Test set results 
y_pred = model.predict(X_test) 
  
y_pred 

# Making the Confusion Matrix 
from sklearn.metrics import confusion_matrix 
  
cm = confusion_matrix(y_test, y_pred) 
  
cm

RE: KeyError -read multiple lines - MckJohan - Nov-04-2019

can you investigate below code. especially the value of ps, and the content on it. probably add some couple of print statement. KeyError should not be difficult to find out.

    # creating PorterStemmer object to 
    # take main stem of each word 
    ps = PorterStemmer()  
       
    # loop for stemming each word 
    # in string array at ith row     
    review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))]

you can try by adding

try:
    code here
except KeyError as e:
    print something here

RE: KeyError -read multiple lines - bongielondy - Nov-06-2019

Thank you. I have updated the code to

for i in range(0, 1000):  
     
        
    # column : "Review", row ith
    try:
        review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])  
      
        # convert all cases to lower cases 
        review = review.lower()  
      
        # split to array(default delimiter is " ") 
        review = review.split()  
      
        # creating PorterStemmer object to 
        # take main stem of each word 
        ps = PorterStemmer()  
      
        # loop for stemming each word 
        # in string array at ith row     
        review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))]  
                  
        # rejoin all string array elements 
        # to create back into a string 
        review = ' '.join(review)   
      
        # append each string to create 
        # array of clean text  
        corpus.append(review)
    except KeyError as e:
        print(ps.stem(review))

I seem to get the numerous lines of the same review. I will lookf at the source file again and give feedback. The output is;

wife visit johannesburg famili function stay locat citi never felt comfort secur stay african pride melros arch mani restaur importantli servic attent staff african pride provid except cannot rave enough stay accommod would use citi world class
wife visit johannesburg famili function stay locat citi never felt comfort secur stay african pride melros arch mani restaur importantli servic attent staff african pride provid except cannot rave enough stay accommod would use citi world class