Python Forum
Grouping Data based on 30% bracket
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Grouping Data based on 30% bracket
#1
I want to one group where all values are with 30% of each other.

working code is follows :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from itertools import combinations
 
def pctDiff(A,B):
    return abs(A-B)*200/(A+B)
 
def main():
    dict2={}
    dict ={'acct_number':10202,'acct_name':'abc','v1_rev':3000,'v2_rev':4444,'v4_rev':234534,'v5_rev':5665,'v6_rev':66,'v7_rev':66,'v3_rev':66}
    vendors_revenue_list =['v1_rev','v2_rev','v3_rev','v4_rev','v5_rev','v6_rev','v7_rev','v8_rev']
    #prepared list of vendors
    for k in vendors_revenue_list:
        if k in dict.keys():
            dict2.update({k: dict[k]})
 
    print(dict2)
    #provides all possible combination
    for a, b in combinations(dict2, 2):
        groups = [(a,b) for a,b in combinations(dict2,2) if pctDiff(dict2[a],dict2[b]) <= 30]
 
    print(groups)
Output:
{'v1_rev': 3000, 'v2_rev': 4444, 'v3_rev': 66, 'v4_rev': 234534, 'v5_rev': 5665, 'v6_rev': 66, 'v7_rev': 66} [('v2_rev', 'v5_rev'), ('v3_rev', 'v6_rev'), ('v3_rev', 'v7_rev'), ('v6_rev', 'v7_rev')]
desired output

Output:
[('v2_rev', 'v5_rev'), ('v3_rev', 'v6_rev','v7_rev')]
Reply
#2
What do you want done if a, b, and c are within 30% of each other, and b, c and d are within 30% of each other. Should the groups be:
(a, b, c), (d)
(a, b. c), (b, c, d)
(a), (b), ©, (d), (a, b), (a, c), (b, c), (b, d), (c, d), (a, b, c), (b, c, d)
something else?
Reply
#3
(Mar-09-2023, 06:26 PM)deanhystad Wrote: What do you want done if a, b, and c are within 30% of each other, and b, c and d are within 30% of each other. Should the groups be:
(a, b, c), (d)
(a, b. c), (b, c, d)
(a), (b), ©, (d), (a, b), (a, c), (b, c), (b, d), (c, d), (a, b, c), (b, c, d)
something else?
Reply
#4
I want result as

(a,b,c) and (b,c,d)

With current implementation I think we will get

(a,b) (b,c) (c,a) (c,d)(b,d)
Reply
#5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
data = {
    "v1_rev": 3000,
    "v2_rev": 4444,
    "v4_rev": 234534,
    "v5_rev": 5665,
    "v6_rev": 66,
    "v7_rev": 66,
    "v3_rev": 66,
}
 
# Sort items in increasing order of their value
items = iter(sorted(data.items(), key=lambda x: x[1]))
threshold = 30
 
bins = []
bin_ = [next(items)]
for item in items:
    # What is the smallest value that can be in bin with item?
    start = item[1] * (200 - threshold) / (200 + threshold)
    if bin_[0][1] < start:
        bins.append(bin_.copy())
    bin_.append(item)
    while bin_[0][1] < start:
        bin_.pop(0)
 
if bin_:
    bins.append(bin_)
 
bins = [dict(bin_) for bin_ in bins]
print(bins)
Output:
[{'v6_rev': 66, 'v7_rev': 66, 'v3_rev': 66}, {'v1_rev': 3000}, {'v2_rev': 4444, 'v5_rev': 5665}, {'v4_rev': 234534}]
As a generator. Testing with overlapping bins:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from typing import Any, Generator
 
 
def dict_grouper(
    value_dict: dict[Any, float], percent: float = 30
) -> Generator[dict[Any, float], None, None]:
    """Group values where each value in group is within "percent" of others."""
    scale = (200 - percent) / (200 + percent)
    items = iter(sorted(value_dict.items(), key=lambda x: x[1]))
    grp = [next(items)]
    for item in items:
        start = item[1] * scale
        if grp[0][1] < start:
            yield dict(grp)
            grp = [x for x in grp[1:] if x[1] >= start]
        grp.append(item)
 
    if grp:
        yield dict(grp)
 
 
print(*dict_grouper(dict(zip(("ABCDEFG"), range(30, 100, 10)))), sep="\n")
Output:
{'A': 30, 'B': 40} {'B': 40, 'C': 50} {'C': 50, 'D': 60} {'D': 60, 'E': 70, 'F': 80} {'E': 70, 'F': 80, 'G': 90}
This should be very fast and stay fast. Using combinations with 7 items there are 127 potential groups and you would compute 742 pctDiff's. As the number of items increases, both these numbers increase rapidly. My algorithm only has to compute pctDiff 7 times, and the number of calculations grows linearly with the item count.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Changing client.get() method type based on size of data... dl0dth 1 748 Jan-02-2025, 08:30 PM
Last Post: dl0dth
  conditionals based on data frame mbrown009 1 1,595 Aug-12-2022, 08:18 AM
Last Post: Larz60+
  I have written a program that outputs data based on GPS signal kalle 1 2,098 Jul-22-2022, 12:10 AM
Last Post: mcmxl22
Question Change elements of array based on position of input data Cola_Reb 6 3,512 May-13-2022, 12:57 PM
Last Post: Cola_Reb
  How to map two data frames based on multiple condition SriRajesh 0 2,358 Oct-27-2021, 02:43 PM
Last Post: SriRajesh
  Grouping and sum of a list of objects Otbredbaron 1 5,433 Oct-23-2021, 01:42 PM
Last Post: Gribouillis
  Extracting unique pairs from a data set based on another value rybina 2 3,070 Feb-12-2021, 08:36 AM
Last Post: rybina
  Data extraction from a table based on column and row names tgottsc1 1 3,138 Jan-09-2021, 10:04 PM
Last Post: buran
  Grouping and summing of dataset jef 0 2,212 Oct-04-2020, 11:03 PM
Last Post: jef
  Extracting data based on specific patterns in a text file K11 1 2,900 Aug-28-2020, 09:00 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020