Python Forum

Full Version: compacting a mutable sequence
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
there's zlib:
import zlib


class TryZlibBz2:
    def __init__(self):
        your_text = str.encode('i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?')
        compressed = self.zlib_encode(your_text)
        print(f'orig len: {len(your_text)}, len compressed: {len(compressed)}')
        print(f'\n{self.zlib_decode(compressed).decode()}')

    def zlib_encode(self, text):
        return zlib.compress(text)

    def zlib_decode(self, enctext):
        return zlib.decompress(enctext)

if __name__ == '__main__':
    TryZlibBz2()
results:
Output:
orig len: 484, len compressed: 263 i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
sorry i was not clear. i did not mean compression. maybe deduplicate or unduplicate would have been a better choice of terminology. and i expected to do this directly to mutable sequences such as lists.

[7,4,3,4,9,4,'foo',3,1,7,5] -> [7,4,3,9,'foo',1,5]

the duplicate items are removed and those items that stay keep their original order.
Don't you know unique_everseen() ? Look in itertools documentation.
(Jan-22-2018, 06:29 AM)Gribouillis Wrote: [ -> ]Don't you know unique_everseen() ? Look in itertools documentation.

no ... i do not know of it. i do now. thanks! i suspected this might exist but had no ideal what name.

but... itertools.unique_everseen() is limited to hashable items. i don't know if i will need to support non-hashable or not. so i still might need to implement my own, in which case a deep compare would be handy (have not looked for one,yet)
(Jan-22-2018, 06:46 AM)Skaperen Wrote: [ -> ]i don't know if i will need to support non-hashable or not.
If you don't know, don't implement it! More seriously, there is a key parameter in unique_everseen. It should be able to handle many non hashable cases because a non hashable item can have a hashable key.
if it turns out that the lists i need to compact might have non-hashable objects then i will simply need to use a less efficient method to keep the memory of what objects have been seen. what i am thinking about is to start running the hashable way (keeping seen objects in a set) while hashing each object, and if any object is ever found to be unhashable, then switch over to the method that handles unhashable objexts (keeping seen objects in a list that is scanned linearly). that way i can have the O(n*log(n)) performance of keeping a set, but still be able to work with a mutable sequence of unhashable objects.