Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
binning data
#1
Hi,

I need to use a function called 'bin_data(data, binsize)' to take a list of floats (stored in the variable 'data') and a float (stored in the variable 'binsize') and output a dictionary representing the 'binned_data'.
The output of the function is a dictionary whose keys represent the bins and whose values are ints indicating the number of data items in each bin.
Bin ranges are inclusive of the lower value and exclusive of the upper value. i.e '0-4.99' etc
example output:
{'0.0 - 5.0': 4, '5.0 - 10.0': 6, '10.0 - 15.0': 4}

The function needs to be able to sort the data into the dictionary depending on what number is assigned to 'binsize' which may change.
I have managed to create a dictionary whose keys represent the bins, but i can't figure out how to sort the data in the list 'data' into the range determined by the bins, count the number of values in each bin and then append that number to the values of the dictionary.
Here's what i have so far:

data = [90.0, 69.0, 58.0, 57.0, 49.0, 55.0, 52.0, 35.0, 43.0, 43.0, 50.0, 0.0, 32.0, 74.0, 48.0, 59.0, 53.0, 80.0, 61.0, 59.0, 67.0, 48.0, 65.0, 30.0, 60.0, 61.0, 37.0, 39.0, 60.0, 41.0, 53.0, 58.0, 37.0, 44.0, 80.0, 55.0, 42.0, 41.0, 26.0, 45.0, 28.0, 46.0, 33.0, 74.0, 42.0, 65.0, 80.0, 38.0, 82.0, 64.0, 33.0, 61.0, 32.0, 51.0, 34.0, 20.0, 47.0, 99.0, 51.0, 53.0, 72.0, 74.0, 22.0, 38.0, 56.0, 61.0, 92.0, 63.0, 69.0, 81.0, 30.0, 14.0, 13.0, 30.0, 30.0, 89.0, 37.0, 25.0, 7.0, 65.0, 47.0, 76.0, 41.0, 43.0, 54.0, 64.0, 78.0, 45.0, 27.0, 41.0, 25.0, 0.0, 63.0, 69.0, 48.0, 74.0, 28.0, 77.0, 47.0]

data.sort()

bin_size = '5'
binsize = float(bin_size)

binned_data = {}

def bin_data(data, binsize):
  numberofbins = int(round((max(data) - min(data)) / binsize)+0.5)#why is it not rounding without the 0.5
  for i in range(numberofbins):
      bin_lower = (i * binsize)
      bin_upper = bin_lower + binsize
      binlower = int(bin_lower)
      binupper = int(bin_upper)
      binrange = (str(binlower) + '-' + str(binupper))
      binned_data.update({binrange: []})

bin_data(data,binsize)
print(binned_data)
Reply
#2
I am not sure if I understand the problem exactly. Here is my solution with comments.

def make_bins(data, bin_size):

  binned_data = {}  #variable to store the output
  bin_low = 0  #initial lower limit of the bin
  bin_high = bin_size  #initial high limit of the bin
  key = str(bin_low) + '-' + str(bin_high)  #variable stores the key name
  i = 1  #counter for the values of the floats fits in the actal bin
  
  for data_piece in data:  #iter over the sorted data

    if bin_low <= data_piece < bin_high:  #if the current data fits the bin, lower inclusive and upper exclusive
      binned_data[key] = i  #append the counter for the key
      i += 1  #increment the key

    else:  # if the current data doesn't fit the bin
      i = 1  #reset counter to 1
      bin_low += bin_size
      bin_high += bin_size  #and increment the bin limits by the bin size
      key = str(bin_low) + '-' + str(bin_high) # generate a new key
      binned_data[key] = i  #and append it 

  print(binned_data)
  return(binned_data)
Reply
#3
Corrected, now it works if the data sequence started with higher value than the bin size.

def make_bins(data, bin_size):
 
  binned_data = {}  #variable to store the output
  bin_low = 0  #initial lower limit of the bin
  bin_high = bin_size  #initial high limit of the bin
  key = str(bin_low) + '-' + str(bin_high)  #variable stores the key name
  i = 1  #counter for the values of the floats fits in the actal bin
   
  for data_piece in data:  #iter over the sorted data

    while data_piece < bin_low:
      bin_low += bin_size
      bin_high += bin_size
 
    if bin_low <= data_piece < bin_high:  #if the current data fits the bin, lower inclusive and upper exclusive
      binned_data[key] = i  #append the counter for the key
      i += 1  #increment the key
 
    else:  # if the current data doesn't fit the bin
      i = 1  #reset counter to 1
      bin_low += bin_size
      bin_high += bin_size  #and increment the bin limits by the bin size
      key = str(bin_low) + '-' + str(bin_high) # generate a new key
      binned_data[key] = i  #and append it 
 
  return(binned_data)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Data binning help please thatlakerboy 1 1,700 Jul-24-2020, 06:33 AM
Last Post: DPaul
  need help with binning data figure8 3 2,144 Jul-21-2020, 03:49 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020