Python Forum
compacting a mutable sequence
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
compacting a mutable sequence
#1
i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
there's zlib:
import zlib


class TryZlibBz2:
    def __init__(self):
        your_text = str.encode('i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?')
        compressed = self.zlib_encode(your_text)
        print(f'orig len: {len(your_text)}, len compressed: {len(compressed)}')
        print(f'\n{self.zlib_decode(compressed).decode()}')

    def zlib_encode(self, text):
        return zlib.compress(text)

    def zlib_decode(self, enctext):
        return zlib.decompress(enctext)

if __name__ == '__main__':
    TryZlibBz2()
results:
Output:
orig len: 484, len compressed: 263 i am looking for some already existing implementation of this logic i am ready to implement for myself, if necessary. i want to "compact" a mutable sequence by removing duplicates, keeping the order of both the unique items and each first item of duplicates. converting to a set loses the original order, so it is not an option to convert to a set and convert back, but maybe a set can be used to help detect duplicates in an implementation. anyone know of an existing implementation?
Reply
#3
sorry i was not clear. i did not mean compression. maybe deduplicate or unduplicate would have been a better choice of terminology. and i expected to do this directly to mutable sequences such as lists.

[7,4,3,4,9,4,'foo',3,1,7,5] -> [7,4,3,9,'foo',1,5]

the duplicate items are removed and those items that stay keep their original order.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
Don't you know unique_everseen() ? Look in itertools documentation.
Reply
#5
(Jan-22-2018, 06:29 AM)Gribouillis Wrote: Don't you know unique_everseen() ? Look in itertools documentation.

no ... i do not know of it. i do now. thanks! i suspected this might exist but had no ideal what name.

but... itertools.unique_everseen() is limited to hashable items. i don't know if i will need to support non-hashable or not. so i still might need to implement my own, in which case a deep compare would be handy (have not looked for one,yet)
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
(Jan-22-2018, 06:46 AM)Skaperen Wrote: i don't know if i will need to support non-hashable or not.
If you don't know, don't implement it! More seriously, there is a key parameter in unique_everseen. It should be able to handle many non hashable cases because a non hashable item can have a hashable key.
Reply
#7
if it turns out that the lists i need to compact might have non-hashable objects then i will simply need to use a less efficient method to keep the memory of what objects have been seen. what i am thinking about is to start running the hashable way (keeping seen objects in a set) while hashing each object, and if any object is ever found to be unhashable, then switch over to the method that handles unhashable objexts (keeping seen objects in a list that is scanned linearly). that way i can have the O(n*log(n)) performance of keeping a set, but still be able to work with a mutable sequence of unhashable objects.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  mutable argument in function definition akbarza 1 470 Dec-15-2023, 02:00 PM
Last Post: deanhystad
  mutable values to string items? fozz 15 2,781 Aug-30-2022, 07:20 PM
Last Post: deanhystad
  "'DataFrame' objects are mutable, thus they cannot be hashed" Mark17 1 6,788 Dec-25-2020, 02:31 AM
Last Post: tinh
  Mutable Strings millpond 3 2,528 Aug-24-2020, 08:42 AM
Last Post: millpond
  What is the meaning of mutable data type? qliu 3 2,929 Apr-17-2020, 07:20 PM
Last Post: deanhystad
  copying parts of mutable sequences Skaperen 1 2,218 Dec-02-2019, 10:34 AM
Last Post: Gribouillis
  A mutable string?? silentknight85 5 4,626 May-31-2019, 10:11 AM
Last Post: silentknight85
  compacting multiple ranges Skaperen 2 3,096 Oct-11-2017, 08:33 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020