Python Forum

Python will trip you up at some point or another. It sometimes can feel like a bug when its not. What people define as a gotcha depends on who you are and a number of other variables. One person's common sense might be another's gotchas and vice versa. It can depend on what operating system you are using, what python version you are using, what IDE you are using, what language you learned before python, how much knowledge acquire regarding python, what your native speaking language is (language barrier), etc.

Multiple Expressions with or keyword
This is a trip up for so many it needs its own thread
http://python-forum.io/Thread-Multiple-e...or-keyword

range(len(sequence)) in for loops
Another one that deserves its own thread
https://python-forum.io/Thread-Basic-Nev...n-sequence

Install the correct bittype version for a 3rd party library

Hide/Show

Linux is case sensitive

Hide/Show

Naming your scripts and directories

Hide/Show

Dont name your scripts the same as the name of a library you are importing, 3rd party library or standard library. For example...don't name your scripts pygame.py or your directory "pygame". Your scripts will try to load your file and/or directory instead of the actual pygame you installed. The same goes for naming your files test.py or naming your directories "test". It will conflict with the built-in test package. The same is true for every standard library and 3rd party library you installed. Don't name your scripts sys, time, etc. Come up with unique filenames. A rule of thumb is any library your script/package imports, should not be the name of your script, or a directory/name of file in your package. To get a list of modules of the standard library and 3rd party modules installed that you do not want to use, execute this in that python version's interpreter help('modules')

(click here for example)

We show that there is no pygame file beforehand

metulburr@ubuntu:~$ ls | grep pygame
metulburr@ubuntu:~$

we show that pygame works

metulburr@ubuntu:~$ python

Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygame
>>> pygame.init()
(6, 0)

we create a pygame.py file in the same directory we run from

metulburr@ubuntu:~$ touch pygame.py

metulburr@ubuntu:~$ ls | grep pygame
pygame.py

pygame.init() is not known because we are importing our own pygame.py file instead now

metulburr@ubuntu:~$ python

Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygame
>>> pygame.init()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'init'

A real example of naming your scripts simple names that can lead to hard to diagnose bugs for beginners

Forgetting __init__.py in your sub-directories
Python2.x requires a __init__.py file in all of your sub-directories. Python3.x it is optional.

Modifying a list (or other container) while iterating over it

Hide/Show

for laser in lasers:
    ...
    if rm:
        lasers.remove(laser)

This will cause an IndexError when you iterate over the loop as you have just removed an index from the list that you are looping over. A catch 22. You need to loop the list in order to remove, but you cannot remove from the list as you loop. This can be easily fixed by looping a copy of the list and removing from the actual list. All you have to do to loop a copy is add [:].

for laser in lasers[:]:
    ...
    if rm:
        lasers.remove(laser)

The [:] is a shallow copy. The is identical to using the copy module for copy.copy(). If you have nested structures you may want a deep copy. This you are going to need to use copy.deepcopy() See more info here

Having circular module dependencies

Hide/Show

In short terms....Don't have a module that imports another module, where that imports the first module. A simple package structure could be

/my_program
    main.py 
    /data
        tools.py
        settings.py

In this example you would execute main.py to run your program. The main.py file is importing the settings.py module. The settings.py module is importing the tools.py module. This is fine. However a circular import arises when the tools.py module is also importing the settings.py module. While there are hacks to get around this. It is best to not do it at all. You will come across odd errors such as

Error:
[color=#000000]NameError: name 'ClassA' is not defined[/color]

when ClassA is indeed defined. It just has not yet been defined. A simple fix is to restructure your program to not do this at all.

Using global keyword when you should be using a class

Hide/Show

empty list in a function argument

Hide/Show

You should be wary when you put an empty list for a default argument in the header of the function. A new list is created once when the function is defined, and the same list is used in each successive call. Python’s default arguments are evaluated once when the function is defined, not each time the function is called. This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well. A small example illustrating this would be

>>> def foo(bar=[ ]):
...     bar.append("baz")
...     return bar
... 
>>> foo()
['baz']
>>> foo()
['baz', 'baz']
>>> foo()
['baz', 'baz', 'baz']

Failing to address differences between Python 2 and Python 3

Hide/Show

I cant find where python installed
https://python-forum.io/Thread-Python-3-...6#pid11496

index variables leak into enclosing scope

Hide/Show

This article explains it. Ill post it here instead of linking because over the years articles/blogs have a tendency to disappear.....

I'll start with a quiz. What does this function do?

def foo(lst):
    a = 0
    for i in lst:
        a += i
    b = 1
    for t in lst:
        b *= i
    return a, b

If you think "computes the sum and product of the items in lst", don't feel too bad about yourself. The bug here is often tricky to spot. If you did see it, well done - but buried in mountains of real code, and when you don't know it's a quiz, discovering the bug is significantly more difficult.

The bug here is due to using i instead of t in the body of the second for loop. But wait, how does this even work? Shouldn't i be invisible outside of the first loop? [1] Well, no. In fact, Python formally acknowledges that the names defined as for loop targets (a more formally rigorous name for "index variables") leak into the enclosing function scope. So this:

for i in [1, 2, 3]:
    pass
print(i)

Is valid and prints 3, by design. In this writeup I want to explore why this is so, why it's unlikely to change, and also use it as a tracer bullet to dig into some interesting parts of the CPython compiler.

And by the way, if you're not convinced this behavior can cause real problems, consider this snippet:

def foo():
    lst = []
    for i in range(4):
        lst.append(lambda: i)
    print([f() for f in lst])

If you'd expect this to print [0, 1, 2, 3], no such luck. This code will, instead, emit [3, 3, 3, 3], because there's just a single i in the scope of foo, and this is what all the lambdas capture.

The official word

The Python reference documentation explicitly documents this behavior in the section on for loops:

Quote:The for-loop makes assignments to the variables(s) in the target list. [...] Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop.

Note the last sentence - let's try it:

for i in []:
    pass
print(i)

Indeed, a NameError is raised. Later on, we'll see that this is a natural outcome of the way the Python VM executes its bytecode.

Why this is so
I actually asked Guido van Rossum about this behavior and he was gracious enough to reply with some historical background (thanks Guido!). The motivation is keeping Python's simple approach to names and scopes without resorting to hacks (such as deleting all the values defined in the loop after it's done - think about the complications with exceptions, etc.) or more complex scoping rules.

In Python, the scoping rules are fairly simple and elegant: a block is either a module, a function body or a class body. Within a function body, names are visible from the point of their definition to the end of the block (including nested blocks such as nested functions). That's for local names, of course; global names (and other nonlocal names) have slightly different rules, but that's not pertinent to our discussion.

The important point here is: the innermost possible scope is a function body. Not a for loop body. Not a with block body. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).

So if you just go about implementing Python, this behavior is what you'll likely to end with. Here's another enlightening snippet:

for i in range(4):
    d = i * 2
print(d)

Would it surprise you to find out that d is visible and accessible after the for loop is finished? No, this is just the way Python works. So why would the index variable be treated any differently?

By the way, the index variables of list comprehensions are also leaked to the enclosing scope. Or, to be precise, were leaked, before Python 3 came along.

Python 3 fixed the leakage from list comprehensions, along with other breaking changes. Make no mistake, changing such behavior is a major breakage in backwards compatibility. This is why I think the current behavior stuck and won't be changed.

Moreover, many folks still find this a useful feature of Python. Consider:

for i, item in enumerate(somegenerator()):
    dostuffwith(i, item)
print('The loop executed {0} times!'.format(i+1))

If you have no idea how many items somegenerator actually returned, this is a pretty succinct way to know. Otherwise you'd have to keep a separate counter.

Here's another example:

for i in somegenerator():
    if isinteresing(i):
        break
dostuffwith(i)

Which is a useful pattern for finding things in a loop and using them afterwards [2].

There are other uses people came up with over the years that justify keeping this behavior in place. It's hard enough to instill breaking changes for features the core developers deem detrimental and harmful. When the feature is argued by many to be useful, and moreover is used in a huge bunch of code in the real world, the chances of removing it are zero.

Under the hood

Now the fun part. Let's see how the Python compiler and VM conspire to make this behavior possible. In this particular case, I think the most lucid way to present things is going backwards from the bytecode. I hope this may also serve as an interesting example on how to go about digging in Python's internals [3] in order to find stuff out (it's so much fun, seriously!)

Let's take a part of the function presented at the start of this article and disassemble it:

def foo(lst):
    a = 0
    for i in lst:
        a += i
    return a

The resulting bytecode is:

0 LOAD_CONST               1 (0)
 3 STORE_FAST               1 (a)

 6 SETUP_LOOP              24 (to 33)
 9 LOAD_FAST                0 (lst)
12 GET_ITER
13 FOR_ITER                16 (to 32)
16 STORE_FAST               2 (i)

19 LOAD_FAST                1 (a)
22 LOAD_FAST                2 (i)
25 INPLACE_ADD
26 STORE_FAST               1 (a)
29 JUMP_ABSOLUTE           13
32 POP_BLOCK

33 LOAD_FAST                1 (a)
36 RETURN_VALUE

As a reminder, LOAD_FAST and STORE_FAST are the opcodes Python uses to access names that are only used within a function. Since the Python compiler knows statically (at compile-time) how many such names exist in each function, they can be accessed with static array offsets as opposed to a hash table, which makes access significanly faster (hence the _FAST suffix). But I digress. What's really important here is that a and i are treated identically. They are both fetched with LOAD_FAST and modified with STORE_FAST. There is absolutely no reason to assume that their visibility is in any way different [4].

So how did this come to be? Somehow, the compiler figured that i is just another local name within foo. This logic lives in the symbol table code, when the compiler walks over the AST to create a control-flow graph from which bytecode is later emitted; there are more details about this process in my article about symbol tables - so I'll just stick to the essentials here.

The symtable code doesn't treat for statements very specially. In symtable_visit_stmt we have:

case For_kind:
    VISIT(st, expr, s->v.For.target);
    VISIT(st, expr, s->v.For.iter);
    VISIT_SEQ(st, stmt, s->v.For.body);
    if (s->v.For.orelse)
        VISIT_SEQ(st, stmt, s->v.For.orelse);
    break;

The loop target is visited as any other expression. Since this code visits the AST, it's worthwhile to dump it to see how the node for the for statement looks:

For(target=Name(id='i', ctx=Store()),
    iter=Name(id='lst', ctx=Load()),
    body=[AugAssign(target=Name(id='a', ctx=Store()),
                    op=Add(),
                    value=Name(id='i', ctx=Load()))],
    orelse=[])

So i lives in a Name node. These are handled in the symbol table code by the following clause in symtable_visit_expr:

case Name_kind:
    if (!symtable_add_def(st, e->v.Name.id,
                          e->v.Name.ctx == Load ? USE : DEF_LOCAL))
        VISIT_QUIT(st, 0);
    /* ... */

Since the name i is clearly tagged with DEF_LOCAL (because of the *_FAST opcodes emitted to access it, but this is also easy to observe if the symbol table is dumped using the symtable module), the code above evidently calls symtable_add_def with DEF_LOCAL as the third argument. This is the right time to glance at the AST above and notice the ctx=Store part of the Name node of i. So it's the AST that already comes in carrying the information that i is stored to in the target part of the For node. Let's see how that comes to be.

The AST-building part of the compiler goes over the parse tree (which is a fairly low-level hierarchical representation of the source code - some background is available here) and, among other things, sets the expr_context attributes on some nodes, most notably Name nodes. Think about it this way, in the following statement:

foo = bar + 1

Both foo and bar are going to end up in Name nodes. But while bar is only being loaded from, foo is actually being stored into in this code. The expr_context attribute is used to distinguish between uses for later consumption by the symbol table code [5].

Back to our for loop targets, though. These are handled in the function that creates an AST for for statements - ast_for_for_stmt. Here are the relevant parts of this function:

static stmt_ty
ast_for_for_stmt(struct compiling *c, const node *n)
{
    asdl_seq *_target, *seq = NULL, *suite_seq;
    expr_ty expression;
    expr_ty target, first;

    /* ... */

    node_target = CHILD(n, 1);
    _target = ast_for_exprlist(c, node_target, Store);
    if (!_target)
        return NULL;
    /* Check the # of children rather than the length of _target, since
       for x, in ... has 1 element in _target, but still requires a Tuple. */
    first = (expr_ty)asdl_seq_GET(_target, 0);
    if (NCH(node_target) == 1)
        target = first;
    else
        target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena);

    /* ... */

    return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset,
               c->c_arena);
}

The Store context is created in the call to ast_for_exprlist, which creates the node for the target (recall the the for loop target may be a sequence of names for tuple unpacking, not just a single name).

This function is probably the most important part in the process of explaining why for loop targets are treated similarly to other names within the loop. After this tagging happens in the AST, the code for handling such names in the symbol table and VM is no different from other names.

Wrapping up

This article discusses a particular behavior of Python that may be considered a "gotcha" by some. I hope the article does a decent job of explaining how this behavior flows naturally from the naming and scoping semantics of Python, why it can be useful and hence is unlikely to ever change, and how the internals of the Python compiler make it work under the hood. Thanks for reading!
[1] Here I'm tempted to make a Microsoft Visual C++ 6 joke, but the fact that most readers of this blog in 2015 won't get it is somewhat disturbing (because it reflects my age, not the abilities of my readers).
[2] You could argue that dowithstuff(i) could go into the if right before the break here. But this isn't always convenient. Besides, according to Guido there's a nice separation of concerns here - the loop is used for searching, and only that. What happens with the value after the search is done is not the loop's concern. I think this is a very good point.
[3] As usual for my articles on Python's internals, this is about Python 3. Specifically, I'm looking at the default branch of the Python repository, where work on the next release (3.5) is being done. But for this particular topic, the source code of any release in the 3.x series should do.
[4] Another thing clear from the disassembly is why i remains invisible if the loop doesn't execute. The GET_ITER and FOR_ITER pair of opcodes treat the thing we loop over as an iterator and then call its __next__ method. If that call ends up raising StopIteration, the VM catches it and exits the loop. Only if an actual value is returned does the VM proceed to execute STORE_FAST to i, thus bringing it into existence for subsequent code to refer to.
[5] It's a curious design, which I suspect stems from the desire for relatively clean recursive visitation code in AST consumers such as the symbol table code and CFG generation.

I feel like you should at least link to the post about the or gotchas. I looked for it on this forum and couldn't find it, btw.

ah I didnt even think of that one Doh

Its in the tutorials.....added it.

Ha! I missed it because it was sticky and I skipped past those as I was looking...

By the way, that thread shows the admin history retro-actively.

Thats cause there really was 3 different thread splits. That was when snippsat merged all tutorials together, and then we decided to split them. For some reason this thread was the base for all the splits as opposed to each. In the same way the thread http://python-forum.io/Thread-Namespace-...th-imports has 281K views because it took the sum of all the tutorial threads when he merged them, and split them again. When he merged them, for some reason it took certain threads as the base and just summed up the info for all merged threads to that one....and left it when it was split out again. I dont know how to fix it, but its just a result of a one time thing. The only way to remove it and keep that plugin is to delete that thread and make anew with the same info.

metulburr

micseydel

metulburr

micseydel

metulburr