Python Forum

Full Version: Simple Python function help needed
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I am trying to learn data analysis with python using pandas library and I was trying to define a function to filter out outliers.

My pandas data frame is "irisdata" and the row I want to clearn the outliers from is "sepal-width". I want to clean by deleting values 3/2rd above the upper quartile and 3/2rd below the lower quartile.

This is what I've gotten so far,
def outlier_remove(dfname,col):
    q1 = dfname[col].quantile(0.25)
    q3 = dfname[col].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    dfname = dfname.loc[(irisset[col] > fence_low) & (dfname[col] < fence_high)]
outlier_remove(irisset,'sepal-width')
Basically, I want irisset to be replaced by the new dataset that I am creating. For instance if I plot it, I get the same old data set. However, if I plot within the function like this:
def outlier_remove(dfname,col):
    q1 = dfname[col].quantile(0.25)
    q3 = dfname[col].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    dfname = dfname.loc[(irisset[col] > fence_low) & (dfname[col] < fence_high)]
    sb.boxplot(dfname)
    plt.show()

Then the plot comes up without outliers. I was wondering how I can replace the dataset
try
def outlier_remove(dfname,col):
    q1 = dfname[col].quantile(0.25)
    q3 = dfname[col].quantile(0.75)
    iqr = q3-q1 #Interquartile range
    fence_low  = q1-1.5*iqr
    fence_high = q3+1.5*iqr
    return dfname.loc[(irisset[col] > fence_low) & (dfname[col] < fence_high)]
irisset = outlier_remove(irisset,'sepal-width')
sb.boxplot(irisset)
plt.show()
Haha, that made me feel stupid :p Thank you!