Please help :)

crystalteoh92 · Oct-09-2017, 03:54 AM

Hi all, I am new to Python and currently doing NLTK sentiment analysis on customer reviews. I have a SAS data set that stores all customer review and i import it into Python and it turn into Data Frame. I try to convert the dataframe into list and apply Sentiment Intensity Analyzer, but it doesn't seem to work.

Can anyone please help? Thanks

mydata = pd.read_sas('C:\\Users\\00124118\\Desktop\\call_center_data.sas7bdat')
dfList = mydata['DESCRIPTION'].tolist()

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

call = []
vs_compound = []
vs_pos = []
vs_neu = []
vs_neg = []

for i in range(0, len(dflist)):
    call.append(dflist[i]['text'])
    vs_compound.append(analyzer.polarity_scores(dflist[i]['text'])['compound'])
    vs_pos.append(analyzer.polarity_scores(dflist[i]['text'])['pos'])
    vs_neu.append(analyzer.polarity_scores(dflist[i]['text'])['neu'])
    vs_neg.append(analyzer.polarity_scores(dflist[i]['text'])['neg'])

from pandas import Series, DataFrame

final = DataFrame({'Call': call,
                        'Compound': vs_compound,
                        'Positive': vs_pos,
                        'Neutral': vs_neu,
                        'Negative': vs_neg})
final = final[['Call', 'Compound','Positive', 'Neutral', 'Negative']]

Currently I'm getting this error

Error:  File "<ipython-input-119-00c678e5dcd7>", line 2, in <module>
    call.append(dflist[i]['text'])

TypeError: byte indices must be integers or slices, not str

**buran** · Oct-09-2017, 06:31 AM

this error looks inconsistent with your code.
The line call.append(dflist[i]['text']) is on line 15 in your code and according to error - it is on line 2.

Also note the difference dfList on line#2 and dflist on line#15 - I would guess these are same

crystalteoh92 · (This post was last modified: Oct-09-2017, 07:07 AM by crystalteoh92.)

(Oct-09-2017, 06:31 AM)buran Wrote: this error looks inconsistent with your code.
The line call.append(dflist[i]['text']) is on line 15 in your code and according to error - it is on line 2.

Also note the difference dfList on line#2 and dflist on line#15 - I would guess these are same

i didnt run the whole python code in one go, i run them in a few times. Thats why the line is different. And yes, the error is on line 15 but not line 2. Do you have any idea what cause this error?

I tried running by changing dfList in line2 to dflist, but it still doesn't work.

**buran** · Oct-09-2017, 07:13 AM

assuming dflist is actually a list, then you could not have 'text' as an index.
I don't understand what you mean by i didnt run the whole python code in one go, i run them in a few times. In any case dflist must be created/populated before you call the problem line... please, provide the exact code that creates the error

crystalteoh92 · Oct-09-2017, 08:13 AM

Sorry for the confusion. okay i think i should explain again.
I have a data that stores the customer review. Like the below table:

POLICY_NO REVIEW
1111 customer is happy with the premium
2222 customer called to enquire about NCB
3333 call drop off
4444 customer is very angry that they did not receive their policy yet

What i would like to achieve is to have a score to indicate whether it's a negative or positive comment

POLICY_NO REVIEW compound positive negative neutral
1111 customer is happy with the premium 0.8 0.7 0 0
2222 customer called to enquire about NCB 0 0 0 1
3333 call drop off 0 0 0 1
4444 customer is very angry that they did not receive -0.6 0 -0.8 0

This is what i ran

for i in range(0, len(dflist)):
    call.append(dflist[i]['text'])
    vs_compound.append(analyzer.polarity_scores(dflist[i]['text'])['compound'])
    vs_pos.append(analyzer.polarity_scores(dflist[i]['text'])['pos'])
    vs_neu.append(analyzer.polarity_scores(dflist[i]['text'])['neu'])
    vs_neg.append(analyzer.polarity_scores(dflist[i]['text'])['neg'])

and this is the error

Error:Traceback (most recent call last):

  File "<ipython-input-124-00c678e5dcd7>", line 2, in <module>
    call.append(dflist[i]['text'])

TypeError: byte indices must be integers or slices, not str

I took this code from this website and try to apply it to my scenario:
http://t-redactyl.io/blog/2017/04/applyi...r-api.html

**buran** · (This post was last modified: Oct-09-2017, 08:24 AM by buran.)

no, that is not what you run. if you run just

for i in range(0, len(dflist)):
    call.append(dflist[i]['text'])
    vs_compound.append(analyzer.polarity_scores(dflist[i]['text'])['compound'])
    vs_pos.append(analyzer.polarity_scores(dflist[i]['text'])['pos'])
    vs_neu.append(analyzer.polarity_scores(dflist[i]['text'])['neu'])
    vs_neg.append(analyzer.polarity_scores(dflist[i]['text'])['neg'])

what you get (of course) is

Error:Traceback (most recent call last):
File "C:\Users\BKolev\Desktop\foo.py", line 1, in <module>
for i in range(0, len(dflist)):
NameError: name 'dflist' is not defined

in other words - we need to see how you populate the dflist. the problem is that your dflist does not have the same structure as data_all from the example you refer to.

crystalteoh92 · (This post was last modified: Oct-09-2017, 08:44 AM by crystalteoh92.)

I see. Could you recommend how i could i move further? because all the codes that i found online for sentiment intensity analyzer is performing on list. However i have my data in pandas.data frame.

Reference: http://opensourceforu.com/2016/12/analys...ents-nltk/

Here is my list in Python:

dflist = mydata['DESCRIPTION'].tolist()

print(dflist)
[b'xfer nb -5099 - kindly cb cust,tq......Wan,pls assist..------..RENEW WITH POS MALAYSIA', b'Miss Chia query how to make a online payment, adv used M2U online and select Etiqa - Life Ins', b'enq for quotation for renewal....pls cb cust..---------..kakej assist (DONE)', b'Cust looking huda....xfer nb-5010-huda.']

I dont understand why there is a b' ' in front all sentence

**buran** · Oct-09-2017, 08:59 AM

why the b - again, it depends how you populate the dflist - and that is what I ask all the time so far. b designates it as binary string.
as to the other problem - at the moment your dflist is a list of strings, so you need to loop over it like this

dflist = [b'xfer nb -5099 - kindly cb cust,tq......Wan,pls assist..------..RENEW WITH POS MALAYSIA', b'Miss Chia query how to make a online payment, adv used M2U online and select Etiqa - Life Ins', b'enq for quotation for renewal....pls cb cust..---------..kakej assist (DONE)', b'Cust looking huda....xfer nb-5010-huda.']
for tweet_txt in dflist:
    call.append(tweet_txt)
    vs_compound.append(analyzer.polarity_scores(tweet_txt)['compound'])
    vs_pos.append(analyzer.polarity_scores(tweet_txt)['pos'])
    vs_neu.append(analyzer.polarity_scores(tweet_txt)['neu'])
    vs_neg.append(analyzer.polarity_scores(tweet_txt)['neg'])

if analyzer does not work with binary strings, you can change the way you populate dflist, or convert the elements to strings like this

 dflist = [txt.decode('utf-8') for txt in dflist]

in the example you refer to, data_all is list of dicts

crystalteoh92 · Oct-09-2017, 09:20 AM

thank you so much! it is working now. :)

Sorry i'm very new to Python and my question may sound stupid. But i really want to ask, why tweet_txt can be link to each sentence in dflist? how does the mechanism works?
I try to change the tweet_txt to sentence, it can still work as well.

**buran** · (This post was last modified: Apr-02-2018, 08:06 AM by buran.)

this is one of the looping techniques available in python. If you have iterable, like list, dict, string, etc. (i.e. any iterable) you can iterate over elements of the iterable using for loop. in this case tweet_txt is just a name (i.e. variable) that takes the value of each of the elements in the list. as you discovered, you can use sentence to the same effect. The important is to use descriptive names when you code.
check this tutorial https://python-forum.io/Thread-Basic-Nev...n-sequence and especially the article linked at the end - https://nedbatchelder.com/text/iter.html

EDIT (02 April 2018) - Ned Batchelder's post/presentation is python2. For changes in dict iteration introduced with python3, please check PEP 469 -- Migration of dict iteration code to Python 3

Please help :)

User Panel Messages

Announcements