Grouping Data based on 30% bracket

**deanhystad** · (This post was last modified: Mar-10-2023, 07:38 PM by deanhystad.)

data = {
    "v1_rev": 3000,
    "v2_rev": 4444,
    "v4_rev": 234534,
    "v5_rev": 5665,
    "v6_rev": 66,
    "v7_rev": 66,
    "v3_rev": 66,
}

# Sort items in increasing order of their value
items = iter(sorted(data.items(), key=lambda x: x[1]))
threshold = 30

bins = []
bin_ = [next(items)]
for item in items:
    # What is the smallest value that can be in bin with item?
    start = item[1] * (200 - threshold) / (200 + threshold)
    if bin_[0][1] < start:
        bins.append(bin_.copy())
    bin_.append(item)
    while bin_[0][1] < start:
        bin_.pop(0)

if bin_:
    bins.append(bin_)

bins = [dict(bin_) for bin_ in bins]
print(bins)

Output:
[{'v6_rev': 66, 'v7_rev': 66, 'v3_rev': 66}, {'v1_rev': 3000}, {'v2_rev': 4444, 'v5_rev': 5665}, {'v4_rev': 234534}]

As a generator. Testing with overlapping bins:

from typing import Any, Generator


def dict_grouper(
    value_dict: dict[Any, float], percent: float = 30
) -> Generator[dict[Any, float], None, None]:
    """Group values where each value in group is within "percent" of others."""
    scale = (200 - percent) / (200 + percent)
    items = iter(sorted(value_dict.items(), key=lambda x: x[1]))
    grp = [next(items)]
    for item in items:
        start = item[1] * scale
        if grp[0][1] < start:
            yield dict(grp)
            grp = [x for x in grp[1:] if x[1] >= start]
        grp.append(item)

    if grp:
        yield dict(grp)


print(*dict_grouper(dict(zip(("ABCDEFG"), range(30, 100, 10)))), sep="\n")

Output:{'A': 30, 'B': 40}
{'B': 40, 'C': 50}
{'C': 50, 'D': 60}
{'D': 60, 'E': 70, 'F': 80}
{'E': 70, 'F': 80, 'G': 90}

This should be very fast and stay fast. Using combinations with 7 items there are 127 potential groups and you would compute 742 pctDiff's. As the number of items increases, both these numbers increase rapidly. My algorithm only has to compute pctDiff 7 times, and the number of calculations grows linearly with the item count.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	conditionals based on data frame	mbrown009	1	965	Aug-12-2022, 08:18 AM Last Post: Larz60+
	I have written a program that outputs data based on GPS signal	kalle	1	1,278	Jul-22-2022, 12:10 AM Last Post: mcmxl22
	Change elements of array based on position of input data	Cola_Reb	6	2,256	May-13-2022, 12:57 PM Last Post: Cola_Reb
	How to map two data frames based on multiple condition	SriRajesh	0	1,581	Oct-27-2021, 02:43 PM Last Post: SriRajesh
	Grouping and sum of a list of objects	Otbredbaron	1	3,367	Oct-23-2021, 01:42 PM Last Post: Gribouillis
	Extracting unique pairs from a data set based on another value	rybina	2	2,399	Feb-12-2021, 08:36 AM Last Post: rybina
	Data extraction from a table based on column and row names	tgottsc1	1	2,465	Jan-09-2021, 10:04 PM Last Post: buran
	Grouping and summing of dataset	jef	0	1,697	Oct-04-2020, 11:03 PM Last Post: jef
	Extracting data based on specific patterns in a text file	K11	1	2,285	Aug-28-2020, 09:00 AM Last Post: Gribouillis
	Grouping algorithm	riccardoob	7	3,197	May-19-2020, 01:22 PM Last Post: deanhystad

Grouping Data based on 30% bracket

User Panel Messages

Announcements