Python Forum

Full Version: trying to understand the python code
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am new to Machine Learning and python. Recently i have been working with Amazon fine food review data from kaggle and its code.
What i don't understand is how is the 'partiton' method used here ?
Moreover, What actually does last 3 lines of code do ?

%matplotlib inline
import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer



# using the SQLite Table to read data.
con = sqlite3.connect('./amazon-fine-food-reviews/database.sqlite') 




#filtering only positive and negative reviews i.e. 
# not taking into consideration those reviews with Score=3
filtered_data = pd.read_sql_query("""
SELECT *
FROM Reviews
WHERE Score != 3
""", con) 




# Give reviews with Score>3 a positive rating, and reviews with a 
# score<3 a negative rating.
def partition(x):
    if x < 3:
        return 'negative'
    return 'positive'

#changing reviews with score less than 3 to be positive vice-versa
actualScore = filtered_data['Score']
positiveNegative = actualScore.map(partition) 
filtered_data['Score'] = positiveNegative
1.actualScore will be just the column Score from the dataframe
2.actualScore.map(partition) will apply (i.e. map) function partition to every element of the actualScore, creating positiveNegative
3.filtered_data['Score'] = positiveNegative will replace values from column Score in the dataframe with values from positiveNegative
as a result the dataframe Score column will have just values positive (i.e. original score>=3) and negativee (i.e. original Score<3)