Jun-06-2018, 10:53 AM
Hello,
I am trying to learn data analysis with python using pandas library and I was trying to define a function to filter out outliers.
My pandas data frame is "irisdata" and the row I want to clearn the outliers from is "sepal-width". I want to clean by deleting values 3/2rd above the upper quartile and 3/2rd below the lower quartile.
This is what I've gotten so far,
Then the plot comes up without outliers. I was wondering how I can replace the dataset
I am trying to learn data analysis with python using pandas library and I was trying to define a function to filter out outliers.
My pandas data frame is "irisdata" and the row I want to clearn the outliers from is "sepal-width". I want to clean by deleting values 3/2rd above the upper quartile and 3/2rd below the lower quartile.
This is what I've gotten so far,
def outlier_remove(dfname,col): q1 = dfname[col].quantile(0.25) q3 = dfname[col].quantile(0.75) iqr = q3-q1 #Interquartile range fence_low = q1-1.5*iqr fence_high = q3+1.5*iqr dfname = dfname.loc[(irisset[col] > fence_low) & (dfname[col] < fence_high)] outlier_remove(irisset,'sepal-width')Basically, I want irisset to be replaced by the new dataset that I am creating. For instance if I plot it, I get the same old data set. However, if I plot within the function like this:
def outlier_remove(dfname,col): q1 = dfname[col].quantile(0.25) q3 = dfname[col].quantile(0.75) iqr = q3-q1 #Interquartile range fence_low = q1-1.5*iqr fence_high = q3+1.5*iqr dfname = dfname.loc[(irisset[col] > fence_low) & (dfname[col] < fence_high)] sb.boxplot(dfname) plt.show()
Then the plot comes up without outliers. I was wondering how I can replace the dataset