![]() |
compacting a mutable sequence - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: compacting a mutable sequence (/thread-7713.html) |
compacting a mutable sequence - Skaperen - Jan-22-2018 i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation? RE: compacting a mutable sequence - Larz60+ - Jan-22-2018 there's zlib: import zlib class TryZlibBz2: def __init__(self): your_text = str.encode('i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?') compressed = self.zlib_encode(your_text) print(f'orig len: {len(your_text)}, len compressed: {len(compressed)}') print(f'\n{self.zlib_decode(compressed).decode()}') def zlib_encode(self, text): return zlib.compress(text) def zlib_decode(self, enctext): return zlib.decompress(enctext) if __name__ == '__main__': TryZlibBz2()results:
RE: compacting a mutable sequence - Skaperen - Jan-22-2018 sorry i was not clear. i did not mean compression. maybe deduplicate or unduplicate would have been a better choice of terminology. and i expected to do this directly to mutable sequences such as lists. [7,4,3,4,9,4,'foo',3,1,7,5] -> [7,4,3,9,'foo',1,5] the duplicate items are removed and those items that stay keep their original order. RE: compacting a mutable sequence - Gribouillis - Jan-22-2018 Don't you know unique_everseen() ? Look in itertools documentation.
RE: compacting a mutable sequence - Skaperen - Jan-22-2018 (Jan-22-2018, 06:29 AM)Gribouillis Wrote: Don't you know no ... i do not know of it. i do now. thanks! i suspected this might exist but had no ideal what name. but... itertools.unique_everseen() is limited to hashable items. i don't know if i will need to support non-hashable or not. so i still might need to implement my own, in which case a deep compare would be handy (have not looked for one,yet)
RE: compacting a mutable sequence - Gribouillis - Jan-22-2018 (Jan-22-2018, 06:46 AM)Skaperen Wrote: i don't know if i will need to support non-hashable or not.If you don't know, don't implement it! More seriously, there is a key parameter in unique_everseen . It should be able to handle many non hashable cases because a non hashable item can have a hashable key.
RE: compacting a mutable sequence - Skaperen - Jan-23-2018 if it turns out that the lists i need to compact might have non-hashable objects then i will simply need to use a less efficient method to keep the memory of what objects have been seen. what i am thinking about is to start running the hashable way (keeping seen objects in a set) while hashing each object, and if any object is ever found to be unhashable, then switch over to the method that handles unhashable objexts (keeping seen objects in a list that is scanned linearly). that way i can have the O(n*log(n)) performance of keeping a set, but still be able to work with a mutable sequence of unhashable objects. |