Dec-29-2021, 04:28 PM
(Dec-29-2021, 04:25 PM)deanhystad Wrote: Sets may not preserve order, but they have order. Items in the set are ordered the same regardless of the order they appear in the list used to make the set.
import random x = list(range(60,70)) print(x, set(x)) random.shuffle(x) print(x, set(x))Order is not imposed by the print command. When printing a set, the order is the same order the set values are returned if you use "for".
Output:[60, 61, 62, 63, 64, 65, 66, 67, 68, 69] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63} [67, 61, 69, 68, 65, 66, 60, 63, 64, 62] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63}
x = list(range(60,70)) y = set(x) print(y, list(y))Because sets exist to do set operations, it makes sense that elements are stored in a way to facilitate this. Hash tables are used when you need a way to quickly access items in large collections, so it would make sense for the set values to be organized using their hash values.
Output:{64, 65, 66, 67, 68, 69, 60, 61, 62, 63} [64, 65, 66, 67, 68, 69, 60, 61, 62, 63]
print([hash(y) for y in set(range(60, 70))])It appears that the has value of an in is the int value, so I decided to find the first place where the set order did not match the hash order.
Output:[64, 65, 66, 67, 68, 69, 60, 61, 62, 63]
x = list(range(1000)) y = set(x) for a, b in zip(x, y): if a != b: print(a, b) breakHmmm, the order of the values/hash values was the same as the order of the set values. I printed the set to verify and noticed that the set was ordered 0, 1....999 in numerical order. So it would appear that the set order depends not only on the hash value but also on the set size. This is not surprising since all hash table algorithms take the table size into account.
Output:
Our set size is 10 so a reasonable has table size is 16 giving us a mask of 15 (b1111). Let's see what order we get if we use the hash value and a mask of 15 to get the hash table index.
mask = 15 x = [(y & mask, y) for y in range(60, 70)] x.sort() print(x) print(set(range(60, 70)))Woohoo! Nailed it on the first try! If we mask the last 4 bits of the hash value, the order of the resulting index matches the order of the values in the set.
Output:[(0, 64), (1, 65), (2, 66), (3, 67), (4, 68), (5, 69), (12, 60), (13, 61), (14, 62), (15, 63)] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63}
Does this matter? No. I think the details of the Python set ordering algorithm are unimportant as long as it is efficient and correct. I cannot see any reason for trying to predict the order of values in a set since as defined in Python a set is unordered. The ordering might also change from one Python implementation to another. This worked for C Python, but J Python or Iron Python might use different algorithms for ordering sets.
This is great!
Yeah I didn't really intend to use this knowledge to predict the order of any item. I just found it very interesting and was curious on why it behaved like this.
Thanks for figuring it out!