Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Clustering for imbalanced data sets
Hi all,
I have a set of measurements with four features.

Two features are continuous (time and distance) and two are discrete.

We also know that the population consists of two groups. One is the minority with having around 10% samples and the other 90% of the samples being the second group. At the same time we do-not know exactly to which group each sample belongs and this is the need we want clustering to show us some hints on how to differentiate the two groups.

We would like to see if a clustering algorithm can see these two populations. Agglomerative clustering with cosine affinity performed by far the best, and I think this is because cosine affinity sees more the structure instead of the numeric values.

I wanted also to try some other techniques that can work on such imbalanced dataset. Can you propose me which technique might fit here?

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  text clustering evaluation ?? khalidreemy 1 301 May-29-2019, 03:10 AM
Last Post: heiner55
  Clustering based on a variable and on a distance matrix flucoe 2 1,353 Dec-16-2018, 09:57 PM
Last Post: flucoe
  Compare 2 Csv data sets, identify record with latest date MJUk 11 1,848 Jan-06-2018, 09:23 PM
Last Post: MJUk
  Match two data sets based on item values klllmmm 7 2,354 Mar-29-2017, 02:33 PM
Last Post: zivoni
  Sklearn Agglomerative Hierarchical Clustering - help with array set up pstarrett 4 2,055 Feb-21-2017, 05:05 AM
Last Post: pstarrett

Forum Jump:

Users browsing this thread: 1 Guest(s)