Python Forum
itertools and amazing speed - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: itertools and amazing speed (/thread-38628.html)



itertools and amazing speed - Pedroski55 - Nov-07-2022

From the thread about getting random sets of image names we had 24 image names:

images = ['im1', 'im2', 'im3', 'im4', 'im5', 'im6', 'im7', 'im8',
'im9', 'im10', 'im11', 'im12', 'im13', 'im14', 'im15',
'im16', 'im17', 'im18','im19','im20','im21','im22','im23','im24']

Something funny is going on with itertools:

If I do this:

perm_set = itertools.permutations(images,9)
This completes in the time it takes to press and release the enter key.

There are n!/(n-r)! permutations for any given n and r.

Set n = 24, r = 9, we have 24!/15! = 474 467 051 520 permutations.

Has my computer really created nearly 500 trillion 9-string tuples in milliseconds? Seems unlikely!

I don't know what goes on behind the scenes in Python, I believe a lot of modules are written in C, but that still seems extremely fast!

My laptop is fast with 2 Ryzen R7 processors, but that beggars belief!

Has it really made 500 trillion tuples? Has itertools created a generator and nothing else??


RE: itertools and amazing speed - ndc85430 - Nov-07-2022

Yes, per the documentation for the module, the functions return iterators.


RE: itertools and amazing speed - deanhystad - Nov-07-2022

perm_set is a permutations object, a generator like object that creates results on demand. To see how long this takes you could ask for a list. I doubt you have enough memory.
perm_set = list(itertools.permutations(images,9))
That an itertools.permutations object is a generator is mentioned in the documentation when they say permutations is "roughly equivalent to"
def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])  # <------ The first tuple.
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])   # <------ More tuples
                break
        else:
            return



RE: itertools and amazing speed - Pedroski55 - Nov-07-2022

Thanks!

I didn't try to make a list of itertools.permutations(images,9), but I did run:

for i in perm_set:
    print(i)
After I'd had breakfast it still hadn't finished, so I did ctrl + c. I hadn't looked at how many permutations 9 from 24 gives!

So how much memory will an iterator or generator object like that occupy, can you give an indication?

@dean Thanks for the function. It will take me a while to digest!


RE: itertools and amazing speed - bowlofred - Nov-08-2022

(Nov-07-2022, 11:07 PM)Pedroski55 Wrote: So how much memory will an iterator or generator object like that occupy, can you give an indication?

The generator is code. Each time it is called, it runs code to create the next item. So it only needs enough memory to maintain the state where it needs to pick up the next time.

Unless you're storing all the output somewhere, this doesn't create any memory pressure. Previously calculated items are discarded.

Attempting to store all the output items in a list or similar (like myperms = list(perm_set)) is going to take a huge amount of memory.


RE: itertools and amazing speed - Pedroski55 - Nov-10-2022

The number of permutations of 9 from 24 is enormous, but the number of combinations is much less. As a list, it is < 12MB.

com_set = itertools.combinations(images, 9)
# creates a list of tuples
mylist = list(com_set)
len(mylist) # 1307504
print(sys.getsizeof(mylist))
Output:
>>> type(com_set) <class 'itertools.combinations'> >>> print(sys.getsizeof(com_set)) 144 >>>
What I'm wondering is however:

Is there some way to interfere with the value of the next pointer in the class com_set?

Some way to pass it a random number as the stored value?


RE: itertools and amazing speed - Gribouillis - Nov-10-2022

(Nov-10-2022, 12:49 AM)Pedroski55 Wrote: Is there some way to interfere with the value of the next pointer in the class com_set?
This is a slightly different problem, but I implemented a Combinations class with random access, which allows to access directly the n-th element of the sequence, even with very large numbers. I wonder why this feature is not included in itertools.combinations()


RE: itertools and amazing speed - Pedroski55 - Nov-10-2022

Impressive!

Thanks a lot, hope you don't mind me copying it.

I can't say I understand it yet, but I'll look at it until I do!


RE: itertools and amazing speed - Gribouillis - Nov-11-2022

(Nov-10-2022, 11:15 PM)Pedroski55 Wrote: Thanks a lot, hope you don't mind me copying it.
I added the MIT License, so that you can freely copy it and modify it.