Python Forum
Clustering for imbalanced data sets
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Clustering for imbalanced data sets
Hi all,
I have a set of measurements with four features.

Two features are continuous (time and distance) and two are discrete.

We also know that the population consists of two groups. One is the minority with having around 10% samples and the other 90% of the samples being the second group. At the same time we do-not know exactly to which group each sample belongs and this is the need we want clustering to show us some hints on how to differentiate the two groups.

We would like to see if a clustering algorithm can see these two populations. Agglomerative clustering with cosine affinity performed by far the best, and I think this is because cosine affinity sees more the structure instead of the numeric values.

I wanted also to try some other techniques that can work on such imbalanced dataset. Can you propose me which technique might fit here?

Possibly Related Threads…
Thread Author Replies Views Last Post
  replace sets of values in an array without using loops paul18fr 7 655 Jun-20-2022, 08:15 PM
Last Post: paul18fr
  Data sets comparison Fraetos 0 870 Sep-14-2021, 06:45 AM
Last Post: Fraetos
  Mann Whitney U-test on several data sets rybina 2 1,408 Jan-05-2021, 03:08 PM
Last Post: rybina
  Least-squares fit multiple data sets multiverse22 1 1,777 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  text clustering evaluation ?? khalidreemy 1 1,642 May-29-2019, 03:10 AM
Last Post: heiner55
  Clustering based on a variable and on a distance matrix flucoe 2 5,283 Dec-16-2018, 09:57 PM
Last Post: flucoe
  Compare 2 Csv data sets, identify record with latest date MJUk 11 4,755 Jan-06-2018, 09:23 PM
Last Post: MJUk
  Match two data sets based on item values klllmmm 7 5,138 Mar-29-2017, 02:33 PM
Last Post: zivoni
  Sklearn Agglomerative Hierarchical Clustering - help with array set up pstarrett 4 4,320 Feb-21-2017, 05:05 AM
Last Post: pstarrett

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020