Python Forum

Full Version: Making a list for positive vs negative reviews based on rating
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys!

I'm very new to Python, so hopefully someone can give me pointers in regards to "for loops".

I'm doing a sentiment analysis and am trying to make a separate list for positive vs negative reviews. The data is in 3 columns (id, sentiment, review)

To give you an idea, the data head looks like this:

data.head()
OUTPUT:
id  sentiment                                             review
0  5814_8          1  With all this stuff going down at the moment w...
1  2381_9          1  \The Classic War of the Worlds\" by Timothy Hi...



Why does the code below not work? I don't get an error but when I try to print it, nothing happens. Huh Huh
I basically want to separate pos vs neg reviews into two lists, so I can compare the word count of the unique words.
num_reviews = data["review"].size

positive_reviews = []
negative_reviews = []

#Positive reviews have a sentiment of 1, negative a sentiment of 0


for i in range( 0, num_reviews ):
    if data["sentiment"][i] == "1":
         positive_reviews.append(data["review"][i])
    else:
        if data["sentiment"][i] == "0":
         negative_reviews.append(data["review"][i])
Moderator:
: sparkz_alot
Please use 'code' tags when posting. Information can be found in the Help Document
I suppose that you are using pandas dataframe...

Check dtype of your sentiment column, its quite possible that it is integer and both of your conditions evaluate as False (0 or 1 instead of "0" or "1" would work).

And iterating over rows of dataframe is very often a terrible idea (pandas and underlying numpy are optimized for vectorized operations with columns, iterating over rows is inefficient) and this is no exception of it. Following code should have same functionality as yours:

positive = data[data["sentiment"] == 1]["review"].tolist()

negative = data[data.sentiment == 0].review.tolist()     
# selecting columns with . is usually shorter, but doesnt work for "ugly" names (spaces/symbols/methods).