Oct-27-2016, 03:13 PM
Python will trip you up at some point or another. It sometimes can feel like a bug when its not. What people define as a gotcha depends on who you are and a number of other variables. One person's common sense might be another's gotchas and vice versa. It can depend on what operating system you are using, what python version you are using, what IDE you are using, what language you learned before python, how much knowledge acquire regarding python, what your native speaking language is (language barrier), etc.
Multiple Expressions with or keyword
This is a trip up for so many it needs its own thread
http://python-forum.io/Thread-Multiple-e...or-keyword
range(len(sequence)) in for loops
Another one that deserves its own thread
https://python-forum.io/Thread-Basic-Nev...n-sequence
Install the correct bittype version for a 3rd party library
Linux is case sensitive
Naming your scripts and directories
Forgetting __init__.py in your sub-directories
Python2.x requires a __init__.py file in all of your sub-directories. Python3.x it is optional.
Modifying a list (or other container) while iterating over it
Having circular module dependencies
Using global keyword when you should be using a class
empty list in a function argument
Failing to address differences between Python 2 and Python 3
I cant find where python installed
https://python-forum.io/Thread-Python-3-...6#pid11496
index variables leak into enclosing scope
Multiple Expressions with or keyword
This is a trip up for so many it needs its own thread
http://python-forum.io/Thread-Multiple-e...or-keyword
range(len(sequence)) in for loops
Another one that deserves its own thread
https://python-forum.io/Thread-Basic-Nev...n-sequence
Install the correct bittype version for a 3rd party library
A common mistake is to get an error when installing the incorrect version for their python version. But an even more common mistake is to get a bit conflict by installing the incorrect bit version for their python bit version. And your operating system bit type has little influence. For example, you can install python 32 bit on windows that is 64 bit. In this case if you install pygame 64 bit because you see you have 64 bit windows, your pygame is not going to install and/or work correctly.
Linux is case sensitive
If you write a program and load images (or any file) that is named my_image.PNG and load it as my_image.png it will work in Windows. The same is not true in Linux, because Linux is case sensitive. my_image.PNG and my_image.png are two different files in linux. This might not effect you if you are on windows, but if you plan on your program running on linux, or this is the sole cause of your program not running on linux, its quite a restriction of an OS for something that is easy to fix.
Naming your scripts and directories
Dont name your scripts the same as the name of a library you are importing, 3rd party library or standard library. For example...don't name your scripts pygame.py or your directory "pygame". Your scripts will try to load your file and/or directory instead of the actual pygame you installed. The same goes for naming your files test.py or naming your directories "test". It will conflict with the built-in test package. The same is true for every standard library and 3rd party library you installed. Don't name your scripts sys, time, etc. Come up with unique filenames. A rule of thumb is any library your script/package imports, should not be the name of your script, or a directory/name of file in your package. To get a list of modules of the standard library and 3rd party modules installed that you do not want to use, execute this in that python version's interpreter
A real example of naming your scripts simple names that can lead to hard to diagnose bugs for beginners
help('modules')
We show that there is no pygame file beforehand
metulburr@ubuntu:~$ ls | grep pygame metulburr@ubuntu:~$we show that pygame works
metulburr@ubuntu:~$ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pygame >>> pygame.init() (6, 0)we create a pygame.py file in the same directory we run from
metulburr@ubuntu:~$ touch pygame.py metulburr@ubuntu:~$ ls | grep pygame pygame.pypygame.init() is not known because we are importing our own pygame.py file instead now
metulburr@ubuntu:~$ python Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pygame >>> pygame.init() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'init'
Forgetting __init__.py in your sub-directories
Python2.x requires a __init__.py file in all of your sub-directories. Python3.x it is optional.
Modifying a list (or other container) while iterating over it
for laser in lasers: ... if rm: lasers.remove(laser)This will cause an IndexError when you iterate over the loop as you have just removed an index from the list that you are looping over. A catch 22. You need to loop the list in order to remove, but you cannot remove from the list as you loop. This can be easily fixed by looping a copy of the list and removing from the actual list. All you have to do to loop a copy is add [:].
for laser in lasers[:]: ... if rm: lasers.remove(laser)The [:] is a shallow copy. The is identical to using the copy module for copy.copy(). If you have nested structures you may want a deep copy. This you are going to need to use copy.deepcopy() See more info here
Having circular module dependencies
In short terms....Don't have a module that imports another module, where that imports the first module. A simple package structure could be
/my_program main.py /data tools.py settings.pyIn this example you would execute main.py to run your program. The main.py file is importing the settings.py module. The settings.py module is importing the tools.py module. This is fine. However a circular import arises when the tools.py module is also importing the settings.py module. While there are hacks to get around this. It is best to not do it at all. You will come across odd errors such as
Error:[color=#000000]NameError: name 'ClassA' is not defined[/color]
when ClassA is indeed defined. It just has not yet been defined. A simple fix is to restructure your program to not do this at all.Using global keyword when you should be using a class
When you don't know what your doing and using the global keyword to change a global namespace in a function, its asking for trouble. Even experienced users need to be careful. Often new users will over use the global keyword when they should be using a class instead. Sometimes they refuse to learn classes due to its complexity. But using the global keyword and the bugs it can add, makes it more confusing to maintain, whereas in a class these problems do not arise. to fix this problem, learn classes. When you need a class it completely removes the need for global keyword at all.
empty list in a function argument
You should be wary when you put an empty list for a default argument in the header of the function. A new list is created once when the function is defined, and the same list is used in each successive call. Python’s default arguments are evaluated once when the function is defined, not each time the function is called. This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well. A small example illustrating this would be
>>> def foo(bar=[ ]): ... bar.append("baz") ... return bar ... >>> foo() ['baz'] >>> foo() ['baz', 'baz'] >>> foo() ['baz', 'baz', 'baz']
Failing to address differences between Python 2 and Python 3
There are quite a few things changed in python3.x. The latest build of python2.7.X addresses some of these issues, but there is never going to be 100% compatibility. Code can be written to ran in both versions most of the time. Most noticeably the changes are print as a statement in 2.x and a function in 3.x, standard library name changes or location changes in 3.x, 2.x xrange() is now 3.x range(), 2.x raw_input() is now 3.x input(), division, etc. There are a number of other changes as well. We also have a built-in interactive library of name changes between the two versions.
It should be noted that Python2.x has been extended in 2015 for 5 more years. However this is bugfixes and maintenance. Nothing new is added. Python2.x will be dead by 2020. You should make arrangements to switch your code to python3.x or start using python3.x if your starting out.
It should be noted that Python2.x has been extended in 2015 for 5 more years. However this is bugfixes and maintenance. Nothing new is added. Python2.x will be dead by 2020. You should make arrangements to switch your code to python3.x or start using python3.x if your starting out.
I cant find where python installed
https://python-forum.io/Thread-Python-3-...6#pid11496
index variables leak into enclosing scope
This article explains it. Ill post it here instead of linking because over the years articles/blogs have a tendency to disappear.....
I'll start with a quiz. What does this function do?
The bug here is due to using i instead of t in the body of the second for loop. But wait, how does this even work? Shouldn't i be invisible outside of the first loop? [1] Well, no. In fact, Python formally acknowledges that the names defined as for loop targets (a more formally rigorous name for "index variables") leak into the enclosing function scope. So this:
And by the way, if you're not convinced this behavior can cause real problems, consider this snippet:
The official word
The Python reference documentation explicitly documents this behavior in the section on for loops:
Note the last sentence - let's try it:
Why this is so
I actually asked Guido van Rossum about this behavior and he was gracious enough to reply with some historical background (thanks Guido!). The motivation is keeping Python's simple approach to names and scopes without resorting to hacks (such as deleting all the values defined in the loop after it's done - think about the complications with exceptions, etc.) or more complex scoping rules.
In Python, the scoping rules are fairly simple and elegant: a block is either a module, a function body or a class body. Within a function body, names are visible from the point of their definition to the end of the block (including nested blocks such as nested functions). That's for local names, of course; global names (and other nonlocal names) have slightly different rules, but that's not pertinent to our discussion.
The important point here is: the innermost possible scope is a function body. Not a for loop body. Not a with block body. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).
So if you just go about implementing Python, this behavior is what you'll likely to end with. Here's another enlightening snippet:
By the way, the index variables of list comprehensions are also leaked to the enclosing scope. Or, to be precise, were leaked, before Python 3 came along.
Python 3 fixed the leakage from list comprehensions, along with other breaking changes. Make no mistake, changing such behavior is a major breakage in backwards compatibility. This is why I think the current behavior stuck and won't be changed.
Moreover, many folks still find this a useful feature of Python. Consider:
Here's another example:
There are other uses people came up with over the years that justify keeping this behavior in place. It's hard enough to instill breaking changes for features the core developers deem detrimental and harmful. When the feature is argued by many to be useful, and moreover is used in a huge bunch of code in the real world, the chances of removing it are zero.
Under the hood
Now the fun part. Let's see how the Python compiler and VM conspire to make this behavior possible. In this particular case, I think the most lucid way to present things is going backwards from the bytecode. I hope this may also serve as an interesting example on how to go about digging in Python's internals [3] in order to find stuff out (it's so much fun, seriously!)
Let's take a part of the function presented at the start of this article and disassemble it:
So how did this come to be? Somehow, the compiler figured that i is just another local name within foo. This logic lives in the symbol table code, when the compiler walks over the AST to create a control-flow graph from which bytecode is later emitted; there are more details about this process in my article about symbol tables - so I'll just stick to the essentials here.
The symtable code doesn't treat for statements very specially. In symtable_visit_stmt we have:
The AST-building part of the compiler goes over the parse tree (which is a fairly low-level hierarchical representation of the source code - some background is available here) and, among other things, sets the expr_context attributes on some nodes, most notably Name nodes. Think about it this way, in the following statement:
Back to our for loop targets, though. These are handled in the function that creates an AST for for statements - ast_for_for_stmt. Here are the relevant parts of this function:
This function is probably the most important part in the process of explaining why for loop targets are treated similarly to other names within the loop. After this tagging happens in the AST, the code for handling such names in the symbol table and VM is no different from other names.
Wrapping up
This article discusses a particular behavior of Python that may be considered a "gotcha" by some. I hope the article does a decent job of explaining how this behavior flows naturally from the naming and scoping semantics of Python, why it can be useful and hence is unlikely to ever change, and how the internals of the Python compiler make it work under the hood. Thanks for reading!
[1] Here I'm tempted to make a Microsoft Visual C++ 6 joke, but the fact that most readers of this blog in 2015 won't get it is somewhat disturbing (because it reflects my age, not the abilities of my readers).
[2] You could argue that dowithstuff(i) could go into the if right before the break here. But this isn't always convenient. Besides, according to Guido there's a nice separation of concerns here - the loop is used for searching, and only that. What happens with the value after the search is done is not the loop's concern. I think this is a very good point.
[3] As usual for my articles on Python's internals, this is about Python 3. Specifically, I'm looking at the default branch of the Python repository, where work on the next release (3.5) is being done. But for this particular topic, the source code of any release in the 3.x series should do.
[4] Another thing clear from the disassembly is why i remains invisible if the loop doesn't execute. The GET_ITER and FOR_ITER pair of opcodes treat the thing we loop over as an iterator and then call its __next__ method. If that call ends up raising StopIteration, the VM catches it and exits the loop. Only if an actual value is returned does the VM proceed to execute STORE_FAST to i, thus bringing it into existence for subsequent code to refer to.
[5] It's a curious design, which I suspect stems from the desire for relatively clean recursive visitation code in AST consumers such as the symbol table code and CFG generation.
I'll start with a quiz. What does this function do?
def foo(lst): a = 0 for i in lst: a += i b = 1 for t in lst: b *= i return a, bIf you think "computes the sum and product of the items in lst", don't feel too bad about yourself. The bug here is often tricky to spot. If you did see it, well done - but buried in mountains of real code, and when you don't know it's a quiz, discovering the bug is significantly more difficult.
The bug here is due to using i instead of t in the body of the second for loop. But wait, how does this even work? Shouldn't i be invisible outside of the first loop? [1] Well, no. In fact, Python formally acknowledges that the names defined as for loop targets (a more formally rigorous name for "index variables") leak into the enclosing function scope. So this:
for i in [1, 2, 3]: pass print(i)Is valid and prints 3, by design. In this writeup I want to explore why this is so, why it's unlikely to change, and also use it as a tracer bullet to dig into some interesting parts of the CPython compiler.
And by the way, if you're not convinced this behavior can cause real problems, consider this snippet:
def foo(): lst = [] for i in range(4): lst.append(lambda: i) print([f() for f in lst])If you'd expect this to print [0, 1, 2, 3], no such luck. This code will, instead, emit [3, 3, 3, 3], because there's just a single i in the scope of foo, and this is what all the lambdas capture.
The official word
The Python reference documentation explicitly documents this behavior in the section on for loops:
Quote:The for-loop makes assignments to the variables(s) in the target list. [...] Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop.
Note the last sentence - let's try it:
for i in []: pass print(i)Indeed, a NameError is raised. Later on, we'll see that this is a natural outcome of the way the Python VM executes its bytecode.
Why this is so
I actually asked Guido van Rossum about this behavior and he was gracious enough to reply with some historical background (thanks Guido!). The motivation is keeping Python's simple approach to names and scopes without resorting to hacks (such as deleting all the values defined in the loop after it's done - think about the complications with exceptions, etc.) or more complex scoping rules.
In Python, the scoping rules are fairly simple and elegant: a block is either a module, a function body or a class body. Within a function body, names are visible from the point of their definition to the end of the block (including nested blocks such as nested functions). That's for local names, of course; global names (and other nonlocal names) have slightly different rules, but that's not pertinent to our discussion.
The important point here is: the innermost possible scope is a function body. Not a for loop body. Not a with block body. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).
So if you just go about implementing Python, this behavior is what you'll likely to end with. Here's another enlightening snippet:
for i in range(4): d = i * 2 print(d)Would it surprise you to find out that d is visible and accessible after the for loop is finished? No, this is just the way Python works. So why would the index variable be treated any differently?
By the way, the index variables of list comprehensions are also leaked to the enclosing scope. Or, to be precise, were leaked, before Python 3 came along.
Python 3 fixed the leakage from list comprehensions, along with other breaking changes. Make no mistake, changing such behavior is a major breakage in backwards compatibility. This is why I think the current behavior stuck and won't be changed.
Moreover, many folks still find this a useful feature of Python. Consider:
for i, item in enumerate(somegenerator()): dostuffwith(i, item) print('The loop executed {0} times!'.format(i+1))If you have no idea how many items somegenerator actually returned, this is a pretty succinct way to know. Otherwise you'd have to keep a separate counter.
Here's another example:
for i in somegenerator(): if isinteresing(i): break dostuffwith(i)Which is a useful pattern for finding things in a loop and using them afterwards [2].
There are other uses people came up with over the years that justify keeping this behavior in place. It's hard enough to instill breaking changes for features the core developers deem detrimental and harmful. When the feature is argued by many to be useful, and moreover is used in a huge bunch of code in the real world, the chances of removing it are zero.
Under the hood
Now the fun part. Let's see how the Python compiler and VM conspire to make this behavior possible. In this particular case, I think the most lucid way to present things is going backwards from the bytecode. I hope this may also serve as an interesting example on how to go about digging in Python's internals [3] in order to find stuff out (it's so much fun, seriously!)
Let's take a part of the function presented at the start of this article and disassemble it:
def foo(lst): a = 0 for i in lst: a += i return aThe resulting bytecode is:
0 LOAD_CONST 1 (0) 3 STORE_FAST 1 (a) 6 SETUP_LOOP 24 (to 33) 9 LOAD_FAST 0 (lst) 12 GET_ITER 13 FOR_ITER 16 (to 32) 16 STORE_FAST 2 (i) 19 LOAD_FAST 1 (a) 22 LOAD_FAST 2 (i) 25 INPLACE_ADD 26 STORE_FAST 1 (a) 29 JUMP_ABSOLUTE 13 32 POP_BLOCK 33 LOAD_FAST 1 (a) 36 RETURN_VALUEAs a reminder, LOAD_FAST and STORE_FAST are the opcodes Python uses to access names that are only used within a function. Since the Python compiler knows statically (at compile-time) how many such names exist in each function, they can be accessed with static array offsets as opposed to a hash table, which makes access significanly faster (hence the _FAST suffix). But I digress. What's really important here is that a and i are treated identically. They are both fetched with LOAD_FAST and modified with STORE_FAST. There is absolutely no reason to assume that their visibility is in any way different [4].
So how did this come to be? Somehow, the compiler figured that i is just another local name within foo. This logic lives in the symbol table code, when the compiler walks over the AST to create a control-flow graph from which bytecode is later emitted; there are more details about this process in my article about symbol tables - so I'll just stick to the essentials here.
The symtable code doesn't treat for statements very specially. In symtable_visit_stmt we have:
case For_kind: VISIT(st, expr, s->v.For.target); VISIT(st, expr, s->v.For.iter); VISIT_SEQ(st, stmt, s->v.For.body); if (s->v.For.orelse) VISIT_SEQ(st, stmt, s->v.For.orelse); break;The loop target is visited as any other expression. Since this code visits the AST, it's worthwhile to dump it to see how the node for the for statement looks:
For(target=Name(id='i', ctx=Store()), iter=Name(id='lst', ctx=Load()), body=[AugAssign(target=Name(id='a', ctx=Store()), op=Add(), value=Name(id='i', ctx=Load()))], orelse=[])So i lives in a Name node. These are handled in the symbol table code by the following clause in symtable_visit_expr:
case Name_kind: if (!symtable_add_def(st, e->v.Name.id, e->v.Name.ctx == Load ? USE : DEF_LOCAL)) VISIT_QUIT(st, 0); /* ... */Since the name i is clearly tagged with DEF_LOCAL (because of the *_FAST opcodes emitted to access it, but this is also easy to observe if the symbol table is dumped using the symtable module), the code above evidently calls symtable_add_def with DEF_LOCAL as the third argument. This is the right time to glance at the AST above and notice the ctx=Store part of the Name node of i. So it's the AST that already comes in carrying the information that i is stored to in the target part of the For node. Let's see how that comes to be.
The AST-building part of the compiler goes over the parse tree (which is a fairly low-level hierarchical representation of the source code - some background is available here) and, among other things, sets the expr_context attributes on some nodes, most notably Name nodes. Think about it this way, in the following statement:
foo = bar + 1Both foo and bar are going to end up in Name nodes. But while bar is only being loaded from, foo is actually being stored into in this code. The expr_context attribute is used to distinguish between uses for later consumption by the symbol table code [5].
Back to our for loop targets, though. These are handled in the function that creates an AST for for statements - ast_for_for_stmt. Here are the relevant parts of this function:
static stmt_ty ast_for_for_stmt(struct compiling *c, const node *n) { asdl_seq *_target, *seq = NULL, *suite_seq; expr_ty expression; expr_ty target, first; /* ... */ node_target = CHILD(n, 1); _target = ast_for_exprlist(c, node_target, Store); if (!_target) return NULL; /* Check the # of children rather than the length of _target, since for x, in ... has 1 element in _target, but still requires a Tuple. */ first = (expr_ty)asdl_seq_GET(_target, 0); if (NCH(node_target) == 1) target = first; else target = Tuple(_target, Store, first->lineno, first->col_offset, c->c_arena); /* ... */ return For(target, expression, suite_seq, seq, LINENO(n), n->n_col_offset, c->c_arena); }The Store context is created in the call to ast_for_exprlist, which creates the node for the target (recall the the for loop target may be a sequence of names for tuple unpacking, not just a single name).
This function is probably the most important part in the process of explaining why for loop targets are treated similarly to other names within the loop. After this tagging happens in the AST, the code for handling such names in the symbol table and VM is no different from other names.
Wrapping up
This article discusses a particular behavior of Python that may be considered a "gotcha" by some. I hope the article does a decent job of explaining how this behavior flows naturally from the naming and scoping semantics of Python, why it can be useful and hence is unlikely to ever change, and how the internals of the Python compiler make it work under the hood. Thanks for reading!
[1] Here I'm tempted to make a Microsoft Visual C++ 6 joke, but the fact that most readers of this blog in 2015 won't get it is somewhat disturbing (because it reflects my age, not the abilities of my readers).
[2] You could argue that dowithstuff(i) could go into the if right before the break here. But this isn't always convenient. Besides, according to Guido there's a nice separation of concerns here - the loop is used for searching, and only that. What happens with the value after the search is done is not the loop's concern. I think this is a very good point.
[3] As usual for my articles on Python's internals, this is about Python 3. Specifically, I'm looking at the default branch of the Python repository, where work on the next release (3.5) is being done. But for this particular topic, the source code of any release in the 3.x series should do.
[4] Another thing clear from the disassembly is why i remains invisible if the loop doesn't execute. The GET_ITER and FOR_ITER pair of opcodes treat the thing we loop over as an iterator and then call its __next__ method. If that call ends up raising StopIteration, the VM catches it and exits the loop. Only if an actual value is returned does the VM proceed to execute STORE_FAST to i, thus bringing it into existence for subsequent code to refer to.
[5] It's a curious design, which I suspect stems from the desire for relatively clean recursive visitation code in AST consumers such as the symbol table code and CFG generation.