Python Forum
What if a column has about 90% of data as outliers? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: What if a column has about 90% of data as outliers? (/thread-34700.html)



What if a column has about 90% of data as outliers? - Asahavey17 - Aug-23-2021

I'm working on a prediction problem where one of the columns, when checked for outliers show that almost 90% of its data are outliers. What should be done in this scenario? Should the column be dropped or should we continue to treat the outliers like any other column?

Please advise.
Jiwaji University of Gwalior


RE: What if a column has about 90% of data as outliers? - jefsummers - Aug-23-2021

Methinks you need to change your definition of outlier. Consider your outlier range as any values outside 2 std deviations from the mean. That should drop your outliers to 5%, though the distribution of values will affect things too.