Python Forum
What if a column has about 90% of data as outliers?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What if a column has about 90% of data as outliers?
#1
I'm working on a prediction problem where one of the columns, when checked for outliers show that almost 90% of its data are outliers. What should be done in this scenario? Should the column be dropped or should we continue to treat the outliers like any other column?

Please advise.
Jiwaji University of Gwalior
Reply
#2
Methinks you need to change your definition of outlier. Consider your outlier range as any values outside 2 std deviations from the mean. That should drop your outliers to 5%, though the distribution of values will affect things too.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 861 Dec-12-2022, 03:22 AM
Last Post: ill8
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  Adding a new column to a Panda Data Frame rsherry8 2 2,083 Jun-06-2021, 06:49 PM
Last Post: jefsummers
  Outliers remain in the scatterplot even after removal d8a988 0 1,276 Mar-12-2021, 12:58 PM
Last Post: d8a988
  Redistributing column data metro17 2 1,651 Nov-28-2020, 05:53 PM
Last Post: metro17
  How to shift data frame rows of specified column Mekala 0 1,860 Jul-21-2020, 02:42 PM
Last Post: Mekala
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 4,245 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Select column between to dates CSV data PythonJD 0 1,762 Apr-14-2020, 12:22 PM
Last Post: PythonJD
  How can I convert time-series data in rows into column srvmig 0 2,035 Apr-11-2020, 05:40 AM
Last Post: srvmig
  add formatted column to pandas data frame alkaline3 0 1,643 Mar-22-2020, 06:44 PM
Last Post: alkaline3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020