Python Forum
Grouping Data based on 30% bracket
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Grouping Data based on 30% bracket
#5
data = {
    "v1_rev": 3000,
    "v2_rev": 4444,
    "v4_rev": 234534,
    "v5_rev": 5665,
    "v6_rev": 66,
    "v7_rev": 66,
    "v3_rev": 66,
}

# Sort items in increasing order of their value
items = iter(sorted(data.items(), key=lambda x: x[1]))
threshold = 30

bins = []
bin_ = [next(items)]
for item in items:
    # What is the smallest value that can be in bin with item?
    start = item[1] * (200 - threshold) / (200 + threshold)
    if bin_[0][1] < start:
        bins.append(bin_.copy())
    bin_.append(item)
    while bin_[0][1] < start:
        bin_.pop(0)

if bin_:
    bins.append(bin_)

bins = [dict(bin_) for bin_ in bins]
print(bins)
Output:
[{'v6_rev': 66, 'v7_rev': 66, 'v3_rev': 66}, {'v1_rev': 3000}, {'v2_rev': 4444, 'v5_rev': 5665}, {'v4_rev': 234534}]
As a generator. Testing with overlapping bins:
from typing import Any, Generator


def dict_grouper(
    value_dict: dict[Any, float], percent: float = 30
) -> Generator[dict[Any, float], None, None]:
    """Group values where each value in group is within "percent" of others."""
    scale = (200 - percent) / (200 + percent)
    items = iter(sorted(value_dict.items(), key=lambda x: x[1]))
    grp = [next(items)]
    for item in items:
        start = item[1] * scale
        if grp[0][1] < start:
            yield dict(grp)
            grp = [x for x in grp[1:] if x[1] >= start]
        grp.append(item)

    if grp:
        yield dict(grp)


print(*dict_grouper(dict(zip(("ABCDEFG"), range(30, 100, 10)))), sep="\n")
Output:
{'A': 30, 'B': 40} {'B': 40, 'C': 50} {'C': 50, 'D': 60} {'D': 60, 'E': 70, 'F': 80} {'E': 70, 'F': 80, 'G': 90}
This should be very fast and stay fast. Using combinations with 7 items there are 127 potential groups and you would compute 742 pctDiff's. As the number of items increases, both these numbers increase rapidly. My algorithm only has to compute pctDiff 7 times, and the number of calculations grows linearly with the item count.
Reply


Messages In This Thread
Grouping Data based on 30% bracket - by purnima1 - Mar-09-2023, 05:42 PM
RE: Grouping Data based on 30% bracket - by deanhystad - Mar-10-2023, 07:38 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  conditionals based on data frame mbrown009 1 965 Aug-12-2022, 08:18 AM
Last Post: Larz60+
  I have written a program that outputs data based on GPS signal kalle 1 1,278 Jul-22-2022, 12:10 AM
Last Post: mcmxl22
Question Change elements of array based on position of input data Cola_Reb 6 2,256 May-13-2022, 12:57 PM
Last Post: Cola_Reb
  How to map two data frames based on multiple condition SriRajesh 0 1,581 Oct-27-2021, 02:43 PM
Last Post: SriRajesh
  Grouping and sum of a list of objects Otbredbaron 1 3,367 Oct-23-2021, 01:42 PM
Last Post: Gribouillis
  Extracting unique pairs from a data set based on another value rybina 2 2,399 Feb-12-2021, 08:36 AM
Last Post: rybina
  Data extraction from a table based on column and row names tgottsc1 1 2,465 Jan-09-2021, 10:04 PM
Last Post: buran
  Grouping and summing of dataset jef 0 1,697 Oct-04-2020, 11:03 PM
Last Post: jef
  Extracting data based on specific patterns in a text file K11 1 2,285 Aug-28-2020, 09:00 AM
Last Post: Gribouillis
  Grouping algorithm riccardoob 7 3,197 May-19-2020, 01:22 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020