Python Forum
remove duplicates from dicts with list values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
remove duplicates from dicts with list values
#11
(May-24-2024, 04:10 PM)deanhystad Wrote: I'm not sure if this is right since you never clarified what "duplicate" means in this particular case, but here's your band-aid.
dict1 = {"SAG01112_SSAP_HA_LPM": [["OS_TYPE", "AIX"], ["IS_COBOL", "1"]], "SAP": [], "C11_RG": [], "W11_RG": []}
dict2 = {
    "SAG01112_SSAP_HA_LPM": [
        ["OS_TYPE", "AIX"],
        ["IP", "172.17.10.112"],
        ["IP", "10.111.160.119"],
        ["IP", "10.111.160.68"],
        ["IP", "10.111.160.66"],
        ["IP", "10.95.0.112"],
        ["IP", "10.111.162.119"],
    ],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}


def remove_common_items(dict_a, dict_b):
    """Remove values that are common to a and b."""
    # For common keys
    for key in set(dict_a) & set(dict_b):
        a = dict_a[key]
        b = dict_b[key]
        # Remove items common to both value lists.
        for item in [x for x in a if x in b]:  # Make list of common before iterating
            a.remove(item)
            b.remove(item)


remove_common_items(dict1, dict2)
print(dict1)
print(dict2)
I still think you dictionaries are messed up and should look like this:

hm...looks also usable..but not sure if sets will work with embedded lists as lists are not hashable afaik. will try...
and no need to nitpick on the term "duplicate"...if a value is in both dicts i call this duplicate, and i want to remove it in the second dict, easy as that.
Reply
#12
There is nothing in the code that requires lists to be hashable. Only the keys are hashed. The iterator for a dict iterates the keys.

This is meaningless.
Quote:if a value is in both dicts i call this duplicate, and i want to remove it in the second dict, easy as that.
In your example, SAP, C11_RG and W11_RG are duplciates, but SAG01112_SSAP_HA_LPM is unique. Are you looking for duplcate values in lists?

If you have two values that are referenced by different keys, but the lists have the same values, are the lists duplicates? Details matter. You know the problem you are trying to solve. You need to provide enough information that other understand the problem enough to provide aid.

You say you have to use lists because the keys are not unique. You are just looking at the problem wrong. I think your dictionary should look like this:
dict2 = {
    "SAG01112_SSAP_HA_LPM": {"OS_TYPE": "AIX"}.
    "IP": ["172.17.10.112",  "10.111.160.119", "10.111.160.68", "10.111.160.66", "10.95.0.112", "IP", "10.111.162.119"],
}
Reply
#13
Dos this give the wantent result?
from pprint import pprint

dict1 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IS_COBOL', '1']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
dict2 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

# Convert lists to sets of tuples for comparison and removal of duplicates
flattened_dict1 = {k: set(map(tuple, v)) for k, v in dict1.items()}
flattened_dict2 = {k: set(map(tuple, v)) for k, v in dict2.items()}
for key in flattened_dict1:
    if key in flattened_dict2:
        common_elements = flattened_dict1[key] & flattened_dict2[key]
        flattened_dict2[key] -= common_elements

# Convert sets of tuples back to lists of lists
dict2 = {k: [list(item) for item in v] for k, v in flattened_dict2.items()}

pprint(dict2)
Output:
{'C11_RG': [], 'SAG01112_SSAP_HA_LPM': [['IP', '10.111.162.119'], ['IP', '10.111.160.66'], ['IP', '172.17.10.112'], ['IP', '10.95.0.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68']], 'SAP': [], 'W11_RG': []}
Pedroski55 likes this post
Reply
#14
The values in the dictionaries are lists, so it is no good just looking at the values, you need to loop through the lists and eliminate values which are also in the comparee dictionary.

dict1 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IS_COBOL', '1']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

dict2 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

# dict1 and dict2 have the same keys
dict1.keys() == dict2.keys() # True
dict3 = {}
dict4 = {}
# if the value is an empty list, it will not be removed
for key in dict1.keys():
    print(f'key = {key}, value = {dict1[key]}')
    print(f'key = {key}, value = {dict2[key]}')
    dict3[key] = [dict1[key][i] for i in range(len(dict1[key])) if dict1[key][i] not in dict2[key]]
    dict4[key] = [dict2[key][i] for i in range(len(dict2[key])) if dict2[key][i] not in dict1[key]]

# have a look
# compare the corresponding dictionaries
for key in dict1.keys():
    print('dict1 and dict3')
    print(f'This is dict1: key = {key}, value = {dict1[key]}')
    print(f'This is dict3: key = {key}, value = {dict3[key]}')
    print('\ndict2 and dict4 \n')
    print(f'This is dict2: key = {key}, value = {dict2[key]}')
    print(f'This is dict4: key = {key}, value = {dict4[key]}')
Fun on Saturday morning!
Reply
#15
My try...

dict1 = {
    "SAG01112_SSAP_HA_LPM": [["OS_TYPE", "AIX"], ["IS_COBOL", "1"]],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}
dict2 = {
    "SAG01112_SSAP_HA_LPM": [
        ["OS_TYPE", "AIX"],
        ["IP", "172.17.10.112"],
        ["IP", "10.111.160.119"],
        ["IP", "10.111.160.68"],
        ["IP", "10.111.160.66"],
        ["IP", "10.95.0.112"],
        ["IP", "10.111.162.119"],
    ],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}



from itertools import chain


def deduplicate(*dicts):
    """
    Generator: Deduplicate all iterable values for each key for each dict.
    The order is kept by the occourence of keys in the first dict, second dict, ...
    """
    all_keys = []
    # iterate over *dicts and append them only, if they don't exist
    # this keeps the key-order of the first dict, second dict, ...
    for key in chain.from_iterable(dicts):
        if key not in all_keys:
            all_keys.append(key)

    print("all_keys:", all_keys)

    # iterate over all keys
    for key in all_keys:
        # list of results, this will later yielded
        result = []
        # iterate over dicts
        for input_dict in dicts:
            # for each dict, call get(key, []) which reuturns an emtpy list
            # if the key does not exist, otherwise the value is returned
            for value in input_dict.get(key, []):
                if value in result:
                    print("Duplicate:", value)
                    # skipping the value if it's already in result
                    continue
                # append, if the value is not in result
                result.append(value)

        yield key, result
        # you could also use result.clear(),
        # but then you have to yield a copy with result.copy()
        # here a new list is assigned, and no copy does happen
        result = []


def deduplicate2dict(*dicts):
    """
    Wrapper function to return a dict
    """
    return dict(deduplicate(*dicts))


result1 = dict(deduplicate(dict1, dict2))
result2 = deduplicate2dict(dict1, dict2)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#16
(May-24-2024, 04:10 PM)deanhystad Wrote: I'm not sure if this is right since you never clarified what "duplicate" means in this particular case, but here's your band-aid.
dict1 = {"SAG01112_SSAP_HA_LPM": [["OS_TYPE", "AIX"], ["IS_COBOL", "1"]], "SAP": [], "C11_RG": [], "W11_RG": []}
dict2 = {
    "SAG01112_SSAP_HA_LPM": [
        ["OS_TYPE", "AIX"],
        ["IP", "172.17.10.112"],
        ["IP", "10.111.160.119"],
        ["IP", "10.111.160.68"],
        ["IP", "10.111.160.66"],
        ["IP", "10.95.0.112"],
        ["IP", "10.111.162.119"],
    ],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}


def remove_common_items(dict_a, dict_b):
    """Remove values that are common to a and b."""
    # For common keys
    for key in set(dict_a) & set(dict_b):
        a = dict_a[key]
        b = dict_b[key]
        # Remove items common to both value lists.
        for item in [x for x in a if x in b]:  # Make list of common before iterating
            a.remove(item)
            b.remove(item)


remove_common_items(dict1, dict2)
print(dict1)
print(dict2)
I still think you dictionaries are messed up and should look like this:

justed tested this one, works correctly on the test data, but in the real script the dict for the deleted data is always nulled/None not matter whats in the comparison dict. so this does not work as expected.

output with data to change and delete:

root@ssap: /tmp # ./aix_reg_client_tcp_dev.py --run-now
changed: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'C11_RG': [['RG_SERVICE_LABEL', 'c11appl0']], 'W11_RG': [['RG_SERVICE_LABEL', 'w11appl0']]}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIX|63b560175e8a15e180b58498f6910fb2
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|172.17.10.112|4adbae9c527c11019ae7183aceb88981
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|10.111.160.119|3a45d15ee636109a924578a238760da9
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|10.111.160.68|45e9aefd72fe1c92984cd74d3166d063
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|10.111.160.66|9023358d7109190cb56a9d0d3c074ea1
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|10.95.0.112|2daad6abb5fd12228bfde1811e581dde
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|IP|10.111.162.119|d88cef25feb71e13ad6cd449be90b8b3
DEBUG ON: STD|1|C11_RG|RG_SERVICE_LABEL|c11appl0|27262efc18961f40a898635b5c7de7af
DEBUG ON: STD|1|W11_RG|RG_SERVICE_LABEL|w11appl0|796ab32081c81a3280e7db06daa3a488
deleted: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIXi']], 'SAP': [], 'C11_RG': [['RG_SERVICE_LABEL', 'BLAA']], 'W11_RG': [['RG_SERVICE_LABEL', 'BLAA']]}
del filtered None
output with no data to change but delete data:

root@ssap: /tmp # ./aix_reg_client_tcp_dev.py --run-now
changed: {'SAG01112_SSAP_HA_LPM': []}
ONLY ECHO, NOTHING SENT TO SERVER
deleted: {'SAG01112_SSAP_HA_LPM': [['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
del filtered None
Reply
#17
(May-25-2024, 08:10 AM)Pedroski55 Wrote: The values in the dictionaries are lists, so it is no good just looking at the values, you need to loop through the lists and eliminate values which are also in the comparee dictionary.

dict1 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IS_COBOL', '1']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

dict2 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

# dict1 and dict2 have the same keys
dict1.keys() == dict2.keys() # True
dict3 = {}
dict4 = {}
# if the value is an empty list, it will not be removed
for key in dict1.keys():
    print(f'key = {key}, value = {dict1[key]}')
    print(f'key = {key}, value = {dict2[key]}')
    dict3[key] = [dict1[key][i] for i in range(len(dict1[key])) if dict1[key][i] not in dict2[key]]
    dict4[key] = [dict2[key][i] for i in range(len(dict2[key])) if dict2[key][i] not in dict1[key]]

# have a look
# compare the corresponding dictionaries
for key in dict1.keys():
    print('dict1 and dict3')
    print(f'This is dict1: key = {key}, value = {dict1[key]}')
    print(f'This is dict3: key = {key}, value = {dict3[key]}')
    print('\ndict2 and dict4 \n')
    print(f'This is dict2: key = {key}, value = {dict2[key]}')
    print(f'This is dict4: key = {key}, value = {dict4[key]}')
Fun on Saturday morning!

this does not work either...

root@ssap: /tmp # ./aix_reg_client_tcp_dev.py --run-now
changed: {'SAG01112_SSAP_HA_LPM': [['UPTIME', '109\n']]}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|UPTIME|109|599acfaa8d5d1c04a19324461bed1003
deleted: {'SAG01112_SSAP_HA_LPM': [['UPTIME', '108\n']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
key = SAG01112_SSAP_HA_LPM, value = [['UPTIME', '108\n']]
key = SAG01112_SSAP_HA_LPM, value = [['UPTIME', '109\n']]
key = SAP, value = []
Traceback (most recent call last):
  File "/tmp/./aix_reg_client_tcp_dev.py", line 757, in <module>
    reg_client_runner()
  File "/tmp/./aix_reg_client_tcp_dev.py", line 718, in reg_client_runner
    del_data_filtered = remove_common_items(del_data, changed_data)
  File "/tmp/./aix_reg_client_tcp_dev.py", line 138, in remove_common_items
    print(f'key = {key}, value = {dict2[key]}')
KeyError: 'SAP'
Reply
#18
(May-25-2024, 07:40 AM)snippsat Wrote: Dos this give the wantent result?
from pprint import pprint

dict1 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IS_COBOL', '1']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
dict2 = {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['IP', '172.17.10.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68'], ['IP', '10.111.160.66'], ['IP', '10.95.0.112'], ['IP', '10.111.162.119']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}

# Convert lists to sets of tuples for comparison and removal of duplicates
flattened_dict1 = {k: set(map(tuple, v)) for k, v in dict1.items()}
flattened_dict2 = {k: set(map(tuple, v)) for k, v in dict2.items()}
for key in flattened_dict1:
    if key in flattened_dict2:
        common_elements = flattened_dict1[key] & flattened_dict2[key]
        flattened_dict2[key] -= common_elements

# Convert sets of tuples back to lists of lists
dict2 = {k: [list(item) for item in v] for k, v in flattened_dict2.items()}

pprint(dict2)
Output:
{'C11_RG': [], 'SAG01112_SSAP_HA_LPM': [['IP', '10.111.162.119'], ['IP', '10.111.160.66'], ['IP', '172.17.10.112'], ['IP', '10.95.0.112'], ['IP', '10.111.160.119'], ['IP', '10.111.160.68']], 'SAP': [], 'W11_RG': []}

does nothing, value stays in deleted data dict...

root@ssap: /tmp # ./aix_reg_client_tcp_dev.py --run-now
changed: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIXi']]}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIXi|63b560175e8a15e180b58498f6910fb2
deleted: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
del filtered {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: DEL|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIX|63b560175e8a15e180b58498f6910fb2
Reply
#19
(May-26-2024, 08:27 PM)DeaD_EyE Wrote: My try...

dict1 = {
    "SAG01112_SSAP_HA_LPM": [["OS_TYPE", "AIX"], ["IS_COBOL", "1"]],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}
dict2 = {
    "SAG01112_SSAP_HA_LPM": [
        ["OS_TYPE", "AIX"],
        ["IP", "172.17.10.112"],
        ["IP", "10.111.160.119"],
        ["IP", "10.111.160.68"],
        ["IP", "10.111.160.66"],
        ["IP", "10.95.0.112"],
        ["IP", "10.111.162.119"],
    ],
    "SAP": [],
    "C11_RG": [],
    "W11_RG": [],
}



from itertools import chain


def deduplicate(*dicts):
    """
    Generator: Deduplicate all iterable values for each key for each dict.
    The order is kept by the occourence of keys in the first dict, second dict, ...
    """
    all_keys = []
    # iterate over *dicts and append them only, if they don't exist
    # this keeps the key-order of the first dict, second dict, ...
    for key in chain.from_iterable(dicts):
        if key not in all_keys:
            all_keys.append(key)

    print("all_keys:", all_keys)

    # iterate over all keys
    for key in all_keys:
        # list of results, this will later yielded
        result = []
        # iterate over dicts
        for input_dict in dicts:
            # for each dict, call get(key, []) which reuturns an emtpy list
            # if the key does not exist, otherwise the value is returned
            for value in input_dict.get(key, []):
                if value in result:
                    print("Duplicate:", value)
                    # skipping the value if it's already in result
                    continue
                # append, if the value is not in result
                result.append(value)

        yield key, result
        # you could also use result.clear(),
        # but then you have to yield a copy with result.copy()
        # here a new list is assigned, and no copy does happen
        result = []


def deduplicate2dict(*dicts):
    """
    Wrapper function to return a dict
    """
    return dict(deduplicate(*dicts))


result1 = dict(deduplicate(dict1, dict2))
result2 = deduplicate2dict(dict1, dict2)

shows weird behaviour...data is added to the delete data dict instead of removal...

root@ssap: /tmp # ./aix_reg_client_tcp_dev.py --run-now
changed: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIXi']]}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: STD|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIXi|63b560175e8a15e180b58498f6910fb2
deleted: {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
all_keys: ['SAG01112_SSAP_HA_LPM', 'SAP', 'C11_RG', 'W11_RG']
del filtered {'SAG01112_SSAP_HA_LPM': [['OS_TYPE', 'AIX'], ['OS_TYPE', 'AIXi']], 'SAP': [], 'C11_RG': [], 'W11_RG': []}
ONLY ECHO, NOTHING SENT TO SERVER
DEBUG ON: DEL|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIX|63b560175e8a15e180b58498f6910fb2
DEBUG ON: DEL|1|SAG01112_SSAP_HA_LPM|OS_TYPE|AIXi|63b560175e8a15e180b58498f6910fb2
Reply
#20
ok, kudos for all for your input, i guess this is not doable in python...i will have to find other ways/languages/whatsoever

thank you!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  unable to remove all elements from list based on a condition sg_python 3 672 Jan-27-2024, 04:03 PM
Last Post: deanhystad
  Copying the order of another list with identical values gohanhango 7 1,441 Nov-29-2023, 09:17 PM
Last Post: Pedroski55
  Search Excel File with a list of values huzzug 4 1,464 Nov-03-2023, 05:35 PM
Last Post: huzzug
  Comparing List values to get indexes Edward_ 7 1,485 Jun-09-2023, 04:57 PM
Last Post: deanhystad
  Adding values with reduce() function from the list of tuples kinimod 10 3,077 Jan-24-2023, 08:22 AM
Last Post: perfringo
  user input values into list of lists tauros73 3 1,260 Dec-29-2022, 05:54 PM
Last Post: deanhystad
  remove partial duplicates from csv ledgreve 0 943 Dec-12-2022, 04:21 PM
Last Post: ledgreve
  Remove values for weekend in a panda series JaneTan 0 782 Dec-12-2022, 01:50 AM
Last Post: JaneTan
  Remove numbers from a list menator01 4 1,632 Nov-13-2022, 01:27 AM
Last Post: menator01
  Remove if similar values available based on two columns klllmmm 1 1,498 Feb-20-2022, 06:55 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020