Sorry for being so late to this thread.
I believe that Dave is confused about Python's behaviour because he is stuck thinking about it in terms of passing arguments to functions by value versus passing them by reference.
Python does neither of these things.
(On the other hand, Dave seems to know full well why Python behaves the way it does, he seems to understand that pass by value and pass by reference are not the only argument passing strategies available, he just wants Python to behave like some other language. Maybe Visual Basic, or C++.)
Let's start with what I imagine Dave's complaint is. Here is a simple example.
def add_and_print(obj, thing_to_add):
obj += thing_to_add
print(obj)
# Set up some variables with values in them.
x = 100
y = ['a', 'b']
add_and_print(x, 1) # Pass the variable x as an argument.
print("x is", x) # Has x changed? No.
add_and_print(y, ['c']) # Pass the variable y as an argument.
print("y is", y) # Has y changed? Yes.
If you run that code, you will see that after calling the function
add_and_print
the variable
x
keeps its old value, but the variable
y
has taken on the changed value. This is, in fairness to Dave, a little surprising, until you understand what is going on.
One explanation of the observed differences in behaviour would be:
"When you pass the integer
x
to a function, the compiler passes the variable "by value", making a copy of the integer and passing it into the function. So changing that argument inside the function
only changes the copy, not the original.
But when you pass the list
y
to the function, the compiler passes the variable "by reference" (i.e. a pointer to the original), not a copy, so changes to the argument inside the function affect the original variable
y
."
If you imagine that there are only two possible ways to pass a variable to a function, by value or by reference, then this explanation seems to be the only possible explanation. Passing an integer makes a copy of it, passing a list passes a reference to the original list.
This is wrong. That is not what happens in Python. Exactly the same thing occurs in other popular languages like Java and Javascript.
Some of the confusion comes from an assembly language view of computer languages. At the very lowest software levels, namely assembly language or machine code, there actually are only two ways you can pass a value into a function: you can make a copy of the value, or you can pass an indirect reference (a pointer) to the original. If you think in terms of, say, C language, that low-level view is all there is.
But in terms of higher level languages like Python and many others, there are
many more options.
Even in Python, if you look under the hood at the details of how the interpreter works at the assembly language level of the code, the interpreter passes references (pointers to memory addresses) by value (i.e. making a copy of the pointer). This low-level view of the world is why you will sometimes see people insist that
Java is call by value, but to do so, they have to make the
astonishing claim that when you pass a value like
100
to a function, the
actual value being passed is not 100 but some hidden, abstract "value" like 0x7f056d5a9dc0.
1
To quote the Effbot:
"Joe, I think our son might be lost in the woods"
"Don't worry, I have his social security number"
While the deepest levels of the interpreter might operate by copying memory addresses, that's not very helpful to know. What we care about is not the invisible, inaccessible pointers to memory addresses of bits in memory, but the high-level objects like the int 100 and a list of characters.
So at the level of Python objects,
what actually happened?
Regardless of whether you are passing 100 or a list, regardless of whether the argument comes from a variable, a constant or an expression, the Python interpreter passes the object itself into the function. So
add_and_print(x, 1)
passes the
100
object itself to the function, not a copy. Same when you pass
y
, the list object
['a', 'b']
gets passed.
Python always uses the same argument evaluation strategy, regardless of the type of object.
The difference between the two cases is not how the object is passed to the function, but what happens inside the function. Because ints are immutable, the line
obj += thing_to_add
has to create a new int object, assigning it to the
local variable inside the function. This leaves the
global variable x
outside the function unchanged.
But when you pass a list, the
+=
assignment operator behaves differently. Because lists are mutable, it doesn't create a new list object, it
modifies the existing list object. And so the change becomes visible outside the function via the global variable
y
.
Is this good? Bad? Both?
It's neither good nor bad. Its just different from the old-school 1970s style of pass by value and pass by reference as used in Pascal and BASIC. All different evaluation strategies and parameter passing strategies have their strengths and weaknesses. The strategy used by Python (and Java, Javascript, Perl, Ruby and many others) is convenient for the writers of the interpreter or compiler, it matches naturally with the way people think about objects in the real world, and if you really need to match the behaviour of other languages:
- to simulate pass by value, just make a copy of your object before passing it;
- to simulate pass by reference, put your variable inside a list, and then use that list item as an indirect reference to the variable.
1 Note for Java experts: I'm referring to "boxed" Java values, or objects, not unboxed low-level machine values, which are passed by value (i.e. by making a copy of the actual value 100).