Python Forum
unsupervised learning for distribution of outliers
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
unsupervised learning for distribution of outliers
#1
Hi all,
I have two questions related to unsupervised learning.

1. How I can try a few unsupervised learning algorithms and get a measure on the purity of the classification. How clear in other words the found clusters are and how accurately describe the phenomena?

2. Which is the unsupervised algorithm that can capture this type of peaks in the data set? So the data points that seem to be moving out from the main body where all the other points are.
Can you provide some help for a clustering technique that can work with 3 dimensional data?
I had to upload my picture here:
https://postimg.cc/3y2jx5xR


Thanks a lot for your reply.
Regards
Alex
Reply
#2
(Jul-30-2019, 05:36 PM)dervast Wrote: Can you provide some help for a clustering technique that can work with 3 dimensional data?

Almost all classification algorithms can work with multidimensional data, including 3d case. Take a look at scikit-learn, e.g. this example. These algorithms can work with multidimensional data too.

(Jul-30-2019, 05:36 PM)dervast Wrote: How clear in other words the found clusters are and how accurately describe the phenomena?

You can use the approach as in the Elbow method. This approach (the ratio of variances)
allows you to estimate "the quality" of a cluster structure. You can compare different cluster structures using Rand Index (or Adjusted Rand Index).
Reply
#3
Thanks for your reply.
Actually I would like a clustering technique that is able to see those outliers highlighted in red as a cluster.


The picture is here:
https://i.postimg.cc/mrx2R4Q4/image.png

Most clustering schemes will not see that part as a separate one: In the picture below k-means completely miss it and treat it as one
https://i.postimg.cc/GhF1YDDW/image.png

Can you please suggest a method here?

Thanks
Alex
Reply
#4
You can start with outlier detection algorithms provided by scikit-learn. Something based on pdf-estimations could be helpful here,
e.g. EllipticEnvelope.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  What if a column has about 90% of data as outliers? Asahavey17 1 1,816 Aug-23-2021, 04:55 PM
Last Post: jefsummers
  Outliers remain in the scatterplot even after removal d8a988 0 1,296 Mar-12-2021, 12:58 PM
Last Post: d8a988
  Unsupervised ML Intrusion detection system CammyS32 0 1,692 Mar-25-2020, 03:50 PM
Last Post: CammyS32
  How to graph a normal distribution? royer14 0 2,006 Apr-22-2019, 09:16 PM
Last Post: royer14

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020