compacting a mutable sequence

Skaperen · Jan-22-2018, 03:44 AM

i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?

**Larz60+** · Jan-22-2018, 04:19 AM

there's zlib:

import zlib


class TryZlibBz2:
    def __init__(self):
        your_text = str.encode('i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?')
        compressed = self.zlib_encode(your_text)
        print(f'orig len: {len(your_text)}, len compressed: {len(compressed)}')
        print(f'\n{self.zlib_decode(compressed).decode()}')

    def zlib_encode(self, text):
        return zlib.compress(text)

    def zlib_decode(self, enctext):
        return zlib.decompress(enctext)

if __name__ == '__main__':
    TryZlibBz2()

results:

Output:orig len: 484, len compressed: 263

i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?

Skaperen · Jan-22-2018, 06:17 AM

sorry i was not clear. i did not mean compression. maybe deduplicate or unduplicate would have been a better choice of terminology. and i expected to do this directly to mutable sequences such as lists.

[7,4,3,4,9,4,'foo',3,1,7,5] -> [7,4,3,9,'foo',1,5]

the duplicate items are removed and those items that stay keep their original order.

**Gribouillis** · Jan-22-2018, 06:29 AM

Don't you know unique_everseen() ? Look in itertools documentation.

Skaperen · (This post was last modified: Jan-22-2018, 06:46 AM by Skaperen.)

(Jan-22-2018, 06:29 AM)Gribouillis Wrote: Don't you know unique_everseen() ? Look in itertools documentation.

no ... i do not know of it. i do now. thanks! i suspected this might exist but had no ideal what name.

but... itertools.unique_everseen() is limited to hashable items. i don't know if i will need to support non-hashable or not. so i still might need to implement my own, in which case a deep compare would be handy (have not looked for one,yet)

**Gribouillis** · (This post was last modified: Jan-22-2018, 06:54 AM by Gribouillis.)

(Jan-22-2018, 06:46 AM)Skaperen Wrote: i don't know if i will need to support non-hashable or not.

If you don't know, don't implement it! More seriously, there is a key parameter in unique_everseen. It should be able to handle many non hashable cases because a non hashable item can have a hashable key.

Skaperen · Jan-23-2018, 03:54 AM

if it turns out that the lists i need to compact might have non-hashable objects then i will simply need to use a less efficient method to keep the memory of what objects have been seen. what i am thinking about is to start running the hashable way (keeping seen objects in a set) while hashing each object, and if any object is ever found to be unhashable, then switch over to the method that handles unhashable objexts (keeping seen objects in a list that is scanned linearly). that way i can have the O(n*log(n)) performance of keeping a set, but still be able to work with a mutable sequence of unhashable objects.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	using mutable in function defintion as optional paramter	akbarza	8	2,067	Apr-27-2024, 09:59 PM Last Post: snippsat
	mutable argument in function definition	akbarza	1	1,228	Dec-15-2023, 02:00 PM Last Post: deanhystad
	mutable values to string items?	fozz	15	5,255	Aug-30-2022, 07:20 PM Last Post: deanhystad
	"'DataFrame' objects are mutable, thus they cannot be hashed"	Mark17	1	8,501	Dec-25-2020, 02:31 AM Last Post: tinh
	Mutable Strings	millpond	3	3,649	Aug-24-2020, 08:42 AM Last Post: millpond
	What is the meaning of mutable data type?	qliu	3	4,099	Apr-17-2020, 07:20 PM Last Post: deanhystad
	copying parts of mutable sequences	Skaperen	1	2,919	Dec-02-2019, 10:34 AM Last Post: Gribouillis
	A mutable string??	silentknight85	5	6,495	May-31-2019, 10:11 AM Last Post: silentknight85
	compacting multiple ranges	Skaperen	2	3,810	Oct-11-2017, 08:33 AM Last Post: Skaperen

compacting a mutable sequence

User Panel Messages

Announcements