Posts: 4,647
Threads: 1,494
Joined: Sep 2016
i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 12,031
Threads: 485
Joined: Sep 2016
there's zlib:
import zlib
class TryZlibBz2:
def __init__(self):
your_text = str.encode('i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?')
compressed = self.zlib_encode(your_text)
print(f'orig len: {len(your_text)}, len compressed: {len(compressed)}')
print(f'\n{self.zlib_decode(compressed).decode()}')
def zlib_encode(self, text):
return zlib.compress(text)
def zlib_decode(self, enctext):
return zlib.decompress(enctext)
if __name__ == '__main__':
TryZlibBz2() results:
Output: orig len: 484, len compressed: 263
i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
sorry i was not clear. i did not mean compression. maybe deduplicate or unduplicate would have been a better choice of terminology. and i expected to do this directly to mutable sequences such as lists.
[7,4,3,4,9,4,'foo',3,1,7,5] -> [7,4,3,9,'foo',1,5]
the duplicate items are removed and those items that stay keep their original order.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,790
Threads: 76
Joined: Jan 2018
Don't you know unique_everseen() ? Look in itertools documentation.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
Jan-22-2018, 06:46 AM
(This post was last modified: Jan-22-2018, 06:46 AM by Skaperen.)
(Jan-22-2018, 06:29 AM)Gribouillis Wrote: Don't you know unique_everseen() ? Look in itertools documentation.
no ... i do not know of it. i do now. thanks! i suspected this might exist but had no ideal what name.
but... itertools.unique_everseen() is limited to hashable items. i don't know if i will need to support non-hashable or not. so i still might need to implement my own, in which case a deep compare would be handy (have not looked for one,yet)
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,790
Threads: 76
Joined: Jan 2018
Jan-22-2018, 06:54 AM
(This post was last modified: Jan-22-2018, 06:54 AM by Gribouillis.)
(Jan-22-2018, 06:46 AM)Skaperen Wrote: i don't know if i will need to support non-hashable or not. If you don't know, don't implement it! More seriously, there is a key parameter in unique_everseen . It should be able to handle many non hashable cases because a non hashable item can have a hashable key.
Posts: 4,647
Threads: 1,494
Joined: Sep 2016
if it turns out that the lists i need to compact might have non-hashable objects then i will simply need to use a less efficient method to keep the memory of what objects have been seen. what i am thinking about is to start running the hashable way (keeping seen objects in a set) while hashing each object, and if any object is ever found to be unhashable, then switch over to the method that handles unhashable objexts (keeping seen objects in a list that is scanned linearly). that way i can have the O(n*log(n)) performance of keeping a set, but still be able to work with a mutable sequence of unhashable objects.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
|