Python Forum
The way to chase how the set of integer is iterated in mutating loop
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
The way to chase how the set of integer is iterated in mutating loop
#1
Hi! I'm new to Python source code reading, but I'm curious and I want to chase the following example's execution on CPython source level
x = set([1])
for i in x:
    y = x.pop()+1
    x.add(y)
print(x)
above code always print {8} on my M2 Mac.
I'm wondering if something happens on __next__ and __iter__ and __hash__ methods on set, but take a glance at CPython source code taught me they are implemented in C. I'm not familiar with the source code connections between C and Python codes on CPython repository

How can I follow this? Also, is this behavior of the set-iterator of integer noted on some doc? Thanks in advance!
Reply
#2
I don't think it's documented beyond "don't modify objects while you're iterating over them". The iterator may do odd things in those cases.

Python sets are hash tables that are dynamically sized. So almost certainly the iterator is moving over each of the hash buckets in turn. The initial size of the set has 8 buckets using the 3 LSB of the hash, and small integers hash to themselves.

So the int 1 has hash == 1 and ends up in bucket #1 of the set.

The for loop iterator grabs the element, but then in the loop that element is removed and another element is added. Because it's the next integer, it hashes into the next bucket. So the loop can keep going until it gets to the last bucket of the set (7).

When 7 is removed and 8 is added, it goes into bucket 0. The iterator is done with all the buckets so completes the iteration and the set is printed (which has the integer 8 as the only element).
khei4 and Gribouillis like this post
Reply
#3
More experiments
N = 8882
x = set(range(1, N))
for v in range(2, N):
    x.remove(v)
print(x)
for i in x:
    y = x.pop()+1
    x.add(y)
print(x)
Output:
{1} {3}
The C code for sets is in the file objects/setobject.c in the repository. The relevant header files are include/cpython/setobject.h and include/setobject.h. You could perhaps use the ctypes module to examine the live C objects, but there is some work to do to achieve this. You could alse perhaps call Python from a C program and use a C debugger to see what's going on in the set object.
khei4 likes this post
Reply
#4
Thank you for both replies! It's interesting and reasonable!
Then, the corresponding set's table idx and integer's SLB 3bit depend on the set's default table size with 1-item list initialization before resizing, and its behavior might be different with other implementations except for CPython, right?

Actually, I found the comments in CPython's source code, probably it's set's __next__. Although I'm not sure the wording around undefined behavior in C/C++ corresponds to Python, mutating while using it as an iterator is a kind of undefined or unexpected behavior, right?

Using FFI to track it sounds useful! I'll try it! Thanks :)
Reply
#5
I found this blog post interesting too. It may gives ideas to investigate what happens under the hood.
khei4 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Output in for LOOP iterated Renym 1 1,522 Nov-17-2019, 09:15 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020