Python Forum

Full Version: What if a column has about 90% of data as outliers?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm working on a prediction problem where one of the columns, when checked for outliers show that almost 90% of its data are outliers. What should be done in this scenario? Should the column be dropped or should we continue to treat the outliers like any other column?

Please advise.
Jiwaji University of Gwalior
Methinks you need to change your definition of outlier. Consider your outlier range as any values outside 2 std deviations from the mean. That should drop your outliers to 5%, though the distribution of values will affect things too.