Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Sample based on the distribution of a feature to create more balanced data set
Hi all,

I am stuck with a classification problem of unlabeled data. One of the issues I have is that the dataset is imbalanced and I would like to improve it a bit to give less hard job to the clustering algorithms.

What I can use though is that one of the features that we know is important for the clustering is imbalanced. In the Figure below, where x axis is speed, you can see that the dataset includes mostly slow speeds. İmage

free image upload

Is it possible based on this distribution to try to sample the dataset more equally? Like pick less entries as percentage that are from low speed and higher percentages from the higher speeds?

The sklearn package does not seem to have such functionality. Can you please help to find the relative packages? I am quite sure that your answers will help many more than me.

Regards Alex
I would put a batch norm per channel on the first layer and give it a go.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  MNE Sample Data Chriso99 3 359 Sep-06-2020, 03:32 PM
Last Post: Larz60+
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 242 Aug-03-2020, 02:05 AM
Last Post: scidam
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 236 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Not able to figure out how to create bar plot on aggregate data - Python darpInd 1 328 Mar-30-2020, 11:37 AM
Last Post: jefsummers
  unsupervised learning for distribution of outliers dervast 3 682 Aug-01-2019, 12:41 AM
Last Post: scidam
  select data based on indice Staph 4 597 Jul-15-2019, 02:05 AM
Last Post: scidam
  Grouping data based on rolling conditions kapilan15 0 513 Jun-05-2019, 01:07 PM
Last Post: kapilan15
  How to graph a normal distribution? royer14 0 643 Apr-22-2019, 09:16 PM
Last Post: royer14
  Create selection box to pass string value based on uniques in Excel column sneakysnek 1 879 Nov-18-2018, 07:29 PM
Last Post: Stefanovietch
  Draw Weibull distribution probability function based on Confidence interval farzadtb 1 1,462 Jul-31-2018, 03:21 PM
Last Post: Vysero

Forum Jump:

Users browsing this thread: 1 Guest(s)