Mar-10-2023, 07:38 PM
(This post was last modified: Mar-10-2023, 07:38 PM by deanhystad.)
data = { "v1_rev": 3000, "v2_rev": 4444, "v4_rev": 234534, "v5_rev": 5665, "v6_rev": 66, "v7_rev": 66, "v3_rev": 66, } # Sort items in increasing order of their value items = iter(sorted(data.items(), key=lambda x: x[1])) threshold = 30 bins = [] bin_ = [next(items)] for item in items: # What is the smallest value that can be in bin with item? start = item[1] * (200 - threshold) / (200 + threshold) if bin_[0][1] < start: bins.append(bin_.copy()) bin_.append(item) while bin_[0][1] < start: bin_.pop(0) if bin_: bins.append(bin_) bins = [dict(bin_) for bin_ in bins] print(bins)
Output:[{'v6_rev': 66, 'v7_rev': 66, 'v3_rev': 66}, {'v1_rev': 3000}, {'v2_rev': 4444, 'v5_rev': 5665}, {'v4_rev': 234534}]
As a generator. Testing with overlapping bins:from typing import Any, Generator def dict_grouper( value_dict: dict[Any, float], percent: float = 30 ) -> Generator[dict[Any, float], None, None]: """Group values where each value in group is within "percent" of others.""" scale = (200 - percent) / (200 + percent) items = iter(sorted(value_dict.items(), key=lambda x: x[1])) grp = [next(items)] for item in items: start = item[1] * scale if grp[0][1] < start: yield dict(grp) grp = [x for x in grp[1:] if x[1] >= start] grp.append(item) if grp: yield dict(grp) print(*dict_grouper(dict(zip(("ABCDEFG"), range(30, 100, 10)))), sep="\n")
Output:{'A': 30, 'B': 40}
{'B': 40, 'C': 50}
{'C': 50, 'D': 60}
{'D': 60, 'E': 70, 'F': 80}
{'E': 70, 'F': 80, 'G': 90}
This should be very fast and stay fast. Using combinations with 7 items there are 127 potential groups and you would compute 742 pctDiff's. As the number of items increases, both these numbers increase rapidly. My algorithm only has to compute pctDiff 7 times, and the number of calculations grows linearly with the item count.