Python Forum

Full Version: how good is the optimization?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
how good is the optimization in Python? in this code will it cache the value from s[n] or will it fetch it twice?

   if s[n]<128 or s[n]>191:
        return -2
or should i code this?

    x = s[n]
    if x<128 or x>191:
        return -2
for best performance.
I would use
if not (128 <= s[n] <= 191):
    return -2
but I'm not sure it is better at execution time. The timeit module should answer this question. It's also a good idea to look at the bytecode.
Use dis to disassemble a function:
import dis


def func1(s, n):
    if s[n] < 128 or s[n] > 191:
        return -2

def func2(s, n):
    x = s[n]
    if x < 128 or x > 191:
        return -2

print('func1:')
print(dis.dis(func1))
print('\nfunc2:')
print(dis.dis(func2))
Output:
func1: 5 0 LOAD_FAST 0 (s) 2 LOAD_FAST 1 (n) 4 BINARY_SUBSCR 6 LOAD_CONST 1 (128) 8 COMPARE_OP 0 (<) 10 POP_JUMP_IF_TRUE 24 12 LOAD_FAST 0 (s) 14 LOAD_FAST 1 (n) 16 BINARY_SUBSCR 18 LOAD_CONST 2 (191) 20 COMPARE_OP 4 (>) 22 POP_JUMP_IF_FALSE 28 6 >> 24 LOAD_CONST 3 (-2) 26 RETURN_VALUE >> 28 LOAD_CONST 0 (None) 30 RETURN_VALUE None func2: 9 0 LOAD_FAST 0 (s) 2 LOAD_FAST 1 (n) 4 BINARY_SUBSCR 6 STORE_FAST 2 (x) 10 8 LOAD_FAST 2 (x) 10 LOAD_CONST 1 (128) 12 COMPARE_OP 0 (<) 14 POP_JUMP_IF_TRUE 24 16 LOAD_FAST 2 (x) 18 LOAD_CONST 2 (191) 20 COMPARE_OP 4 (>) 22 POP_JUMP_IF_FALSE 28 11 >> 24 LOAD_CONST 3 (-2) 26 RETURN_VALUE >> 28 LOAD_CONST 0 (None) 30 RETURN_VALUE None
In func1 you see LOAD_FAST s and n two times.
In func2 there is only one time access to s and n.

Usually what time costs are lookups to nonlocal and global names. Function and Method calls has been improved with Python 3.6 and 3.7.
When you want to do micro-optimization, you'll benefit if you first assign all global names, to local names.

import timeit
import math


def foo1(y):
    for i in range(1_000_000):
        yield math.sqrt(i) ** y

def foo2(y):
    sqrt = math.sqrt
    for i in range(1_000_000):
        yield sqrt(i) ** y

run1 = timeit.timeit('list(foo1(10))', globals=globals(), number=50)
run2 = timeit.timeit('list(foo2(10))', globals=globals(), number=50)
print(f'Function with global lookup: {run1:.2f} s\nFunction with local lookup: {run2:.2f} s')
Output:
Function with global lookup: 12.97 s Function with local lookup: 9.79 s
Test it with the timeit module.
Turn your power savings off, maybe you should also deactivate your turbo boost of your cpu, if your cpu supports it.

I my case the cpu frequency is not fixed, so the test results are not always the same.
In this version, there are only two LOAD_FAST and no STORE_FAST
>>> def func3(s, n):
...     if not (128 <= s[n] <= 191):
...         return -2
... 
>>> import dis
>>> dis.dis(func3)
  2           0 LOAD_CONST               1 (128)
              3 LOAD_FAST                0 (s)
              6 LOAD_FAST                1 (n)
              9 BINARY_SUBSCR
             10 DUP_TOP
             11 ROT_THREE
             12 COMPARE_OP               1 (<=)
             15 JUMP_IF_FALSE_OR_POP    27
             18 LOAD_CONST               2 (191)
             21 COMPARE_OP               1 (<=)
             24 JUMP_FORWARD             2 (to 29)
        >>   27 ROT_TWO
             28 POP_TOP
        >>   29 POP_JUMP_IF_TRUE        36

  3          32 LOAD_CONST               4 (-2)
             35 RETURN_VALUE
        >>   36 LOAD_CONST               0 (None)
             39 RETURN_VALUE
(Aug-09-2018, 04:54 AM)Skaperen Wrote: [ -> ]how good is the optimization in Python? in this code will it cache the value from s[n] or will it fetch it twice?

   if s[n]<128 or s[n]>191:
        return -2
or should i code this?

    x = s[n]
    if x<128 or x>191:
        return -2
for best performance.

I'm not sure this is an optimization issue. s[n] could have a different value every time it's accessed, so it'd HAVE to be accessed twice. This is easy enough to test, even if you don't want to dig into the dis module:
>>> class foo:
...   def __getitem__(self, index):
...     print("indexed!")
...     return index ** 2
...
>>> x = foo()
>>> x[4]
indexed!
16
>>> if x[4] > 10 and x[4] < 50:
...   print("match")
...
indexed!
indexed!
match
Note that these analyses are Python distribution-specific. IronPython, Jython and PyPy (especially PyPy) may perform differently (it wouldn't surprise me if PyPy detected here that there isn't a change, and optimized). CPython does no JIT'ing, so far as I know.
i like Gribouillis' solution to change the expression to have one expression for the in-between test. when i first saw that Python could do it that way, i had dismissed it in my mind as "i'm never going to do that" and for the reason that i considered it a "noobie feature". but this case shows a justification for it. it would be even more so if it were a big complex expression.

i'm going to have to look into module dis. i have't, yet. what i would like to get is the disassembly written to a file, and run it interpreted. is that what .pyc and .pyo files are, in the binary form? if so, can the python engine make the .pyc file from it?

i've long wanted to have an assembly-like language with namespace-like means to store variables, for the purpose of getting students more used to this kind of programming before they have get used to memory allocation aspects.

fyi, i once did make an isbetween(a,b,c) macro in C that would make sure the expression to be tested as between two other expression was one chunk of code. i also did that in 370 assembler macro language.
(Aug-10-2018, 12:23 AM)Skaperen Wrote: [ -> ]if so, can the python engine make the .pyc file from it?
You could perhaps use a module such as BytecodeAssembler for this (at least for python 2)