Python Forum
KeyError -read multiple lines
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
KeyError -read multiple lines
#1
I am new to Python. An example reviews code has single line reviews and runs well. Mine has multiple lines. I converted the csv file to tsv. The reviews file has 2 columns, Review and Liked. Liked contains 0 or 1, for 'not liked' or 'liked'. This is for natural language processing.

# Importing Libraries 
import numpy as np   
import pandas as pd  
  
# Import dataset 
dataset = pd.read_csv("../AfricanPride_b.txt", delimiter = '\t', error_bad_lines = False)

# library to clean data 
import re  
  
# Natural Language Tool Kit 
import nltk  
  
nltk.download('stopwords') 
  
# to remove stopword 
from nltk.corpus import stopwords 
  
# for Stemming propose  
from nltk.stem.porter import PorterStemmer 
  
# Initialize empty array 
# to append clean text  
corpus = []  
  
# 1000 (reviews) rows to clean 
for i in range(0, 5000):  
      
    # column : "Review", row ith 
    review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])  
      
    # convert all cases to lower cases 
    review = review.lower()  
      
    # split to array(default delimiter is " ") 
    review = review.split()  
      
    # creating PorterStemmer object to 
    # take main stem of each word 
    ps = PorterStemmer()  
      
    # loop for stemming each word 
    # in string array at ith row     
    review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))]  
                  
    # rejoin all string array elements 
    # to create back into a string 
    review = ' '.join(review)   
      
    # append each string to create 
    # array of clean text  
    corpus.append(review) 
This results in a KeyError.
KeyError Traceback (most recent call last)
<ipython-input-8-0f0b9d7dcfd5> in <module>
21
22 # column : "Review", row ith
---> 23 review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
24
25 # convert all cases to lower cases

The rest of the code is
 # Creating the Bag of Words model 
from sklearn.feature_extraction.text import CountVectorizer 
  
# To extract max 1500 feature. 
# "max_features" is attribute to 
# experiment with to get better results 
cv = CountVectorizer(max_features = 1500)  
  
# X contains corpus (dependent variable) 
X = cv.fit_transform(corpus).toarray()  
  
# y contains answers if review 
# is positive or negative 
y = dataset.iloc[:, 1].values 

# Creating the Bag of Words model 
from sklearn.feature_extraction.text import CountVectorizer 
  
# To extract max 1500 feature. 
# "max_features" is attribute to 
# experiment with to get better results 
cv = CountVectorizer(max_features = 1500)  
  
# X contains corpus (dependent variable) 
X = cv.fit_transform(corpus).toarray()  
  
# y contains answers if review 
# is positive or negative 
y = dataset.iloc[:, 1].values 

# Splitting the dataset into 
# the Training set and Test set 
from sklearn.model_selection import train_test_split 
  
# experiment with "test_size" 
# to get better results 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25) 

# Fitting Random Forest Classification 
# to the Training set 
from sklearn.ensemble import RandomForestClassifier 
  
# n_estimators can be said as number of 
# trees, experiment with n_estimators 
# to get better results  
model = RandomForestClassifier(n_estimators = 501, 
                            criterion = 'entropy') 
                              
model.fit(X_train, y_train)

# Predicting the Test set results 
y_pred = model.predict(X_test) 
  
y_pred 

# Making the Confusion Matrix 
from sklearn.metrics import confusion_matrix 
  
cm = confusion_matrix(y_test, y_pred) 
  
cm
Reply


Messages In This Thread
KeyError -read multiple lines - by bongielondy - Nov-04-2019, 09:19 PM
RE: KeyError -read multiple lines - by MckJohan - Nov-04-2019, 11:37 PM
RE: KeyError -read multiple lines - by bongielondy - Nov-06-2019, 01:33 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to write the condition for deleting multiple lines? Lky 3 1,198 Jul-10-2022, 02:28 PM
Last Post: Lky
  Delete multiple lines from txt file Lky 6 2,375 Jul-10-2022, 12:09 PM
Last Post: jefsummers
  Display table field on multiple lines, 'wordwrap' 3python 0 1,802 Aug-06-2021, 08:17 PM
Last Post: 3python
  Open and read multiple text files and match words kozaizsvemira 3 6,806 Jul-07-2021, 11:27 AM
Last Post: Larz60+
  [Solved] Trying to read specific lines from a file Laplace12 7 3,614 Jun-21-2021, 11:15 AM
Last Post: Laplace12
  pulling multiple lines from a txt IceJJFish69 3 2,615 Apr-26-2021, 05:56 PM
Last Post: snippsat
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 6,025 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Python: Automated Script to Read Multiple Files in Respective Matrices Robotguy 7 4,296 Jul-03-2020, 01:34 AM
Last Post: bowlofred
  Read CSV error: python KeyError: 'Time' charlicruz 1 5,229 Jun-27-2020, 09:56 AM
Last Post: charlicruz
  Read Multiples Text Files get specific lines based criteria zinho 5 3,200 May-19-2020, 12:30 PM
Last Post: zinho

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020