Python Forum
How to group related products in relationship groups?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to group related products in relationship groups?
#1
Hi,

I have a dataset which consists of relationships between old product and new product. I would like to have them grouped into relationship groups. I have written a script with some sample data. The expected result are after each row with a number. This column is not a part of the original data, there you will only find two columns Old product and New product.

As you can see I don't get the expected result on this row: "9000825_ENDOS025", "9000825_NEXO025", 3),

This relationsship between these two products are put in group 5 (see under result), but I want it grouped in group 3, because you can find this product key 9000825_NEXO025 on both left and right side.

I tried to sort the dataset, which gave me the right result, but I don't think I can rely on sorting the dataset which consists of 134.000 rows. How to change the code to get the desired result?

Best regards

Morten

def group_related_products(data):
    groups = []
    for row in data:
        old_product, new_product, group = row
        found_group = False
        for existing_group in groups:
            if old_product in existing_group:
                existing_group.add(new_product)
                found_group = True
                break
        if not found_group:
            groups.append({old_product, new_product})
    return groups

# Sample data
data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

# Group related products
related_groups = group_related_products(data)

# Print the groups
for i, group in enumerate(related_groups, 1):
    print(f"Group {i}: {group}")
Output:
Group 1: {'9000463_2526534', '9006274_88008621', '9000002_88008621'} Group 2: {'9000463_160159', '9000002_88008625'} Group 3: {'9000756_13002', '9000756_13004', '9000825_NEXO025'} Group 4: {'9000756_42431', '9000756_42420'} Group 5: {'9000825_ENDOS025', '9000825_NEXO025'} Group 6: {'9000048_1000010123', '9000035_KZDC120003', '9000028_IV-9001B', '9032273_006899'} Group 7: {'9000048_1000010123', '9032272_BH-EGF', '9000035_KZDC120003', '9000028_IV-9001B'}
Reply
#2
It seems that you are looking for the connected components of an undirected graph. You could use specialized modules such as networkx for this
data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

import networkx as nx
G = nx.Graph()
for a, b, _ in data:
    G.add_edge(a, b)
for c in nx.connected_components(G):
    print(c)
Output:
{'9006274_88008621', '9000463_2526534', '9000002_88008621'} {'9000463_160159', '9000002_88008625'} {'9000756_13002', '9000756_13004', '9000825_NEXO025', '9000825_ENDOS025'} {'9000756_42431', '9000756_42420'} {'9000028_IV-9001B', '9032272_BH-EGF', '9000035_KZDC120003', '9000048_1000010123', '9032273_006899'}
You can also use other implementations, for example it seems that there are two implementations in Rosetta Code: the Tarjan algorithm (although Tarjan's algorithm is for directed graphs, you may need to add the reversed edges to your graph).
« We can solve any problem by introducing an extra level of indirection »
Reply
#3
Not too sure what this is all about.

134.000 rows, no problem! Send them over!

# Sample data
data = [
    ("9000002_88008621", "9000002_88008621", 1),
    ("9000002_88008621", "9000463_2526534", 1),
    ("9000002_88008625", "9000002_88008625", 2),
    ("9000002_88008625", "9000463_160159", 2),
    ("9000825_NEXO025", "9000756_13002", 3),
    ("9000756_13002", "9000756_13004", 3),
    ("9000756_42420", "9000756_42431", 4),
    ("9000002_88008621", "9006274_88008621", 1),
    ("9000825_ENDOS025", "9000825_NEXO025", 3),
    ("9032273_006899", "9000048_1000010123", 6),
    ("9032273_006899", "9000035_KZDC120003", 6),
    ("9032273_006899", "9000028_IV-9001B", 6),
    ("9032272_BH-EGF", "9000048_1000010123", 7),
    ("9032272_BH-EGF", "9000035_KZDC120003", 7),
    ("9032272_BH-EGF", "9000028_IV-9001B", 7),
]

data_dict = {row[0]:[] for row in data}
len(data) # returns 15
len(data_dict) # returns 8: some old products are related to more than 1 new product

for row in data:
    tup = (row[1], row[2])
    data_dict[row[0]].append(tup)

for item in data_dict.items():
    print(item)

count = 1
for key in data_dict.keys():
    print(f"Group {count}: old product = {key}, related new products = {data_dict[key]}")
    count +=1
Gives:

Output:
Group 1: old product = 9000002_88008621, related new products = [('9000002_88008621', 1), ('9000463_2526534', 1), ('9006274_88008621', 1)] Group 2: old product = 9000002_88008625, related new products = [('9000002_88008625', 2), ('9000463_160159', 2)] Group 3: old product = 9000825_NEXO025, related new products = [('9000756_13002', 3)] Group 4: old product = 9000756_13002, related new products = [('9000756_13004', 3)] Group 5: old product = 9000756_42420, related new products = [('9000756_42431', 4)] Group 6: old product = 9000825_ENDOS025, related new products = [('9000825_NEXO025', 3)] Group 7: old product = 9032273_006899, related new products = [('9000048_1000010123', 6), ('9000035_KZDC120003', 6), ('9000028_IV-9001B', 6)] Group 8: old product = 9032272_BH-EGF, related new products = [('9000048_1000010123', 7), ('9000035_KZDC120003', 7), ('9000028_IV-9001B', 7)]
What exactly you wish to do with the values of data_dict, I am not clear on that.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  why does VS want to install seemingly unrelated products? db042190 3 795 Jun-12-2023, 02:47 PM
Last Post: deanhystad
  How do I run a program without any relationship to it? Pymon 3 1,383 Apr-05-2022, 12:17 AM
Last Post: Pymon
  Ldap Search for finding user Groups ilknurg 1 1,910 Mar-11-2022, 12:10 PM
Last Post: DeaD_EyE
  Make Groups with the List Elements quest 2 2,074 Jul-11-2021, 09:58 AM
Last Post: perfringo
  Understanding Regex Groups matt_the_hall 5 3,004 Jan-11-2021, 02:55 PM
Last Post: matt_the_hall
  Although this is a talib related Q it's mostly related to python module installing.. Evalias123 4 5,965 Jan-10-2021, 11:39 PM
Last Post: Evalias123
  How to solve equations, with groups of variables and or constraints? ThemePark 0 1,768 Oct-05-2020, 07:22 PM
Last Post: ThemePark
  Create homogeneous groups with Kmeans ? preliator 0 1,617 Sep-01-2020, 02:29 PM
Last Post: preliator
  Generate Cartesian Products with Itertools Incrementally CoderMan 2 1,956 Jun-04-2020, 04:51 PM
Last Post: CoderMan
  Regex: finding if three groups have a value in them Daring_T 7 3,549 May-15-2020, 12:27 AM
Last Post: Daring_T

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020