Python Forum
a set of more complex data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
a set of more complex data
#1
sets are limited to hashable types. i am looking for a data type where i can stuff in many complex (not hashable) data objects where i don't need to retain the order. a list retains the order. i am wondering if there is something that can more efficient than a list, but, unlike a set, can hold any data type. i expect to have over a million of these objects and some containing over 100 items, storing a total of a few million items.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
The other main container types all maintain order (including dictionaries as of 3.7). Can't you just ignore the ordering? Sets of course have to contain unique items all of the same type and dictionaries require unique keys.

For only a few million items, pretty much any of the containers will do the job. If the data objects are large though, you might want to use slots to save memory (by default, class attributes are stored in dictionaries) and possibly use a database (the obvious option being the near ubiquitous sqlite) if the data footprint is beyond reasonable expectations.

Efficiency really relates to the nature of the data and the primary operations you intend to carry out. Some data structures are better optimised for certain operations that others.

Keep in mind libraries like numpy are written in c (mostly) to provide a significant performance advantage in number crunching and manipulation over standard python code but no idea if that is useful to your kind of data.
I am trying to help you, really, even if it doesn't always seem that way
Reply
#3
Hashability isn't about complexity. The requirements for a hash are that that hash of an object does not change and if two objects are equal, they must have the same hash. No matter how complex your objects are, if they meet those two criteria, then you can assign them a hash with the __hash__ method.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#4
yes, i can ignore the order. i don't need to have them be mixed up. my thought was that something in between might exist such as a kind of set that does not need items to be hashable. i do not intend to get items by any key or test if they are in there. i will just get them out in arbitrary order (it can be an iterator) all at once.

but, if such a thing does not exist, it is trivial to just use a list. i can append and take them out in the order they are in the list.

yes, these objects can be big. typical they have 10 to 20 data items (mostly strings) that typically run 40 to 100 characters. i have seen extreme cases with items exceeding 2000 characters and objects with over 100 items, though generally not both extremes at the same time.

the objects can and do change but nut while stored in this. i could change them over to be immutable types and then change them back. but that would be very wasteful to do.

i don't have any (other) need to hash them. i have no need to look for them, much as one might do to see if some string is in a set. i only need to put the items in, then take them all out in any arbitrary order. "taking them out" does not need to result in an empty object. if it can give me all items, each just once, not missing any, while actually leaving them in there, that's fine. so yeah a list would work. if there is a more efficient way that meets my needs, i want to know. if not, i want to know so i can quit looking early.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  complex survey data analysis abeshkc 1 2,744 Nov-06-2019, 06:14 AM
Last Post: ThomasL

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020