Python Forum
need help with binning data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
need help with binning data
#1
Hi there, I need help with binning some data for a homework question.
Say I have a list, [2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 19, 21], I need to sort them in bins increasing in 4's, starting with the lowest value.
The end format should be a dictionary, with how often the numbers appear inside each group:

Ideal output: {'2-6': 5, '6-10': 3, '10-14': 3, '14-18': 2, '18-22': 2}

I've written the following code but I just don't know how to count how often a number 'fits' into a bin, that is, adding the counted value to the key in the dictionary. I have managed to create a dictionary whose keys represent the bins, but I can't figure out how to sort the data in the list 'data' into the range determined by the bins.

That is, I can't figure out how to count the number of values in each bin, and then append that number to the values of the dictionary.

Can someone please help? Greatly appreciated :)

def bin_data():
    data = [2, 3, 4, 5, 6, 8, 10, 12, 14, 16, 19, 21]
    data.sort()
    binsize = 4
    binned_data = {}

    def bin_data(data, binsize):
        numberofbins = int(round((max(data) - min(data)) / binsize)+0.5)
        for i in range(numberofbins):
            bin_lower = (i * binsize) + data[0]
            bin_upper = bin_lower + binsize
            binlower = int(bin_lower)
            binupper = int(bin_upper)
            binrange = (str(binlower) + '-' + str(binupper))

            binned_data.update({binrange: 1})

###Ideal Output: {'2-6': 5, '6-10': 3, '10-14': 3, '14-18': 2, '18-22': 2}
###x is count of frequency
Reply
#2
Before working out the count, i wonder why the 'ideal output' has overlapping values.
Does 6 belong in the first or the second bin ?

I would fix that before counting.

Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
I would say that using function this way is against core principle of 'reuse': function doesn't have any arguments and data is hardcoded. Why to put it into function instead of writing just code?
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
First things first. If your minimum value is 2 and you bin size is 4, your ranges are: 2-5, 6-9, 10-13, ... Your bins overlap, assuming your values are integers. If you are binning float values your rang is fine, but you need to decide where to put numbers that exactly match a bin min and a bin max value.

What you are doing in your code so far is creating bins and setting the count for each bin to 1 (should be 0). Is this necessary? Do you want to create potentially empty bins? If not, there is no need for this step and you can create bins as needed while looping through the data (which doesn't need to be sorted).

What remains is putting the data in bins. I bet there is an equation to calculate the starting bin value for any number given the bin size and starting (minimum) value.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Data binning help please thatlakerboy 1 1,686 Jul-24-2020, 06:33 AM
Last Post: DPaul
  binning data stephd 2 4,108 Jan-22-2020, 04:44 PM
Last Post: geer26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020