Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Clustering for imbalanced data sets
Hi all,
I have a set of measurements with four features.

Two features are continuous (time and distance) and two are discrete.

We also know that the population consists of two groups. One is the minority with having around 10% samples and the other 90% of the samples being the second group. At the same time we do-not know exactly to which group each sample belongs and this is the need we want clustering to show us some hints on how to differentiate the two groups.

We would like to see if a clustering algorithm can see these two populations. Agglomerative clustering with cosine affinity performed by far the best, and I think this is because cosine affinity sees more the structure instead of the numeric values.

I wanted also to try some other techniques that can work on such imbalanced dataset. Can you propose me which technique might fit here?

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Least-squares fit multiple data sets multiverse22 1 297 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  text clustering evaluation ?? khalidreemy 1 481 May-29-2019, 03:10 AM
Last Post: heiner55
  Clustering based on a variable and on a distance matrix flucoe 2 2,365 Dec-16-2018, 09:57 PM
Last Post: flucoe
  Compare 2 Csv data sets, identify record with latest date MJUk 11 2,208 Jan-06-2018, 09:23 PM
Last Post: MJUk
  Match two data sets based on item values klllmmm 7 2,860 Mar-29-2017, 02:33 PM
Last Post: zivoni
  Sklearn Agglomerative Hierarchical Clustering - help with array set up pstarrett 4 2,455 Feb-21-2017, 05:05 AM
Last Post: pstarrett

Forum Jump:

Users browsing this thread: 1 Guest(s)