Python Forum
Clustering for imbalanced data sets - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Clustering for imbalanced data sets (/thread-21330.html)



Clustering for imbalanced data sets - dervast - Sep-25-2019

Hi all,
I have a set of measurements with four features.


Two features are continuous (time and distance) and two are discrete.


We also know that the population consists of two groups. One is the minority with having around 10% samples and the other 90% of the samples being the second group. At the same time we do-not know exactly to which group each sample belongs and this is the need we want clustering to show us some hints on how to differentiate the two groups.

We would like to see if a clustering algorithm can see these two populations. Agglomerative clustering with cosine affinity performed by far the best, and I think this is because cosine affinity sees more the structure instead of the numeric values.

I wanted also to try some other techniques that can work on such imbalanced dataset. Can you propose me which technique might fit here?
Regards
Alex