Python Forum
Peculiar pattern from printing of sets
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Peculiar pattern from printing of sets
#4
(Dec-29-2021, 04:25 PM)deanhystad Wrote: Sets may not preserve order, but they have order. Items in the set are ordered the same regardless of the order they appear in the list used to make the set.
import random
x = list(range(60,70))
print(x, set(x))
random.shuffle(x)
print(x, set(x))
Output:
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63} [67, 61, 69, 68, 65, 66, 60, 63, 64, 62] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63}
Order is not imposed by the print command. When printing a set, the order is the same order the set values are returned if you use "for".
x = list(range(60,70))
y = set(x)
print(y, list(y))
Output:
{64, 65, 66, 67, 68, 69, 60, 61, 62, 63} [64, 65, 66, 67, 68, 69, 60, 61, 62, 63]
Because sets exist to do set operations, it makes sense that elements are stored in a way to facilitate this. Hash tables are used when you need a way to quickly access items in large collections, so it would make sense for the set values to be organized using their hash values.
print([hash(y) for y in set(range(60, 70))])
Output:
[64, 65, 66, 67, 68, 69, 60, 61, 62, 63]
It appears that the has value of an in is the int value, so I decided to find the first place where the set order did not match the hash order.
x = list(range(1000))
y = set(x)
for a, b in zip(x, y):
    if a != b:
        print(a, b)
        break
Output:
Hmmm, the order of the values/hash values was the same as the order of the set values. I printed the set to verify and noticed that the set was ordered 0, 1....999 in numerical order. So it would appear that the set order depends not only on the hash value but also on the set size. This is not surprising since all hash table algorithms take the table size into account.

Our set size is 10 so a reasonable has table size is 16 giving us a mask of 15 (b1111). Let's see what order we get if we use the hash value and a mask of 15 to get the hash table index.
mask = 15
x = [(y & mask, y) for y in range(60, 70)]
x.sort()
print(x)
print(set(range(60, 70)))
Output:
[(0, 64), (1, 65), (2, 66), (3, 67), (4, 68), (5, 69), (12, 60), (13, 61), (14, 62), (15, 63)] {64, 65, 66, 67, 68, 69, 60, 61, 62, 63}
Woohoo! Nailed it on the first try! If we mask the last 4 bits of the hash value, the order of the resulting index matches the order of the values in the set.

Does this matter? No. I think the details of the Python set ordering algorithm are unimportant as long as it is efficient and correct. I cannot see any reason for trying to predict the order of values in a set since as defined in Python a set is unordered. The ordering might also change from one Python implementation to another. This worked for C Python, but J Python or Iron Python might use different algorithms for ordering sets.


This is great!

Yeah I didn't really intend to use this knowledge to predict the order of any item. I just found it very interesting and was curious on why it behaved like this.
Thanks for figuring it out!
Reply


Messages In This Thread
Peculiar pattern from printing of sets - by SahandJ - Dec-29-2021, 01:37 PM
RE: Peculiar pattern from printing of sets - by SahandJ - Dec-29-2021, 04:28 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How does one combine 2 data sets ? detlefschmitt 2 2,341 Sep-03-2021, 03:38 AM
Last Post: detlefschmitt
  Looping Through Large Data Sets JoeDainton123 10 6,119 Oct-18-2020, 02:58 PM
Last Post: buran
  comprehension for sets Skaperen 2 2,530 Aug-07-2020, 10:12 PM
Last Post: Skaperen
  Sort sets by item values Sergey 4 98,020 Apr-19-2019, 10:50 AM
Last Post: Sergey
  Problem with character sets Pedroski55 4 5,489 Mar-04-2019, 02:35 AM
Last Post: snippsat
  merge 3 sql data sets to 1 librairy brecht83 0 2,590 Sep-26-2018, 10:13 PM
Last Post: brecht83
  Sets and Lists mp3909 2 3,089 Feb-21-2018, 11:54 AM
Last Post: mp3909

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020