views:

147

answers:

1

I am considering moving from Matlab to Python/numpy for data analysis and numerical simulations. I have used Matlab (and SML-NJ) for years, and am very comfortable in the functional environment without side effects (barring I/O), but am a little reluctant about the side effects in Python. Can people share their favorite gotchas regarding side effects, and if possible, how they got around them? As an example, I was a bit surprised when I tried the following code in Python:

lofls = [[]] * 4    #an accident waiting to happen!
lofls[0].append(7)  #not what I was expecting...
print lofls         #gives [[7], [7], [7], [7]]
#instead, I should have done this (I think)
lofls = [[] for x in range(4)]
lofls[0].append(7)  #only appends to the first list
print lofls         #gives [[7], [], [], []]

thanks in advance

+6  A: 

Confusing references to the same (mutable) object with references to separate objects is indeed a "gotcha" (suffered by all non-functional languages, ones which have mutable objects and, of course, references). A frequently seen bug in beginners' Python code is misusing a default value which is mutable, e.g.:

def addone(item, alist=[]):
  alist.append(item)
  return alist

This code may be correct if the purpose is to have addone keep its own state (and return the one growing list to successive callers), much as static data would work in C; it's not correct if the coder is wrongly assuming that a new empty list will be made at each call.

Raw beginners used to functional languages can also be confused by the command-query separation design decision in Python's built-in containers: mutating methods that don't have anything in particular to return (i.e., the vast majority of mutating methods) return nothing (specifically, they return None) -- they're doing all their work "in-place". Bugs coming from misunderstanding this are easy to spot, e.g.

alist = alist.append(item)

is pretty much guaranteed to be a bug -- it appends an item to the list referred to by name alist, but then rebinds name alist to None (the return value of the append call).

While the first issue I mentioned is about an early-binding that may mislead people who think the binding is, instead, a late one, there are issues that go the other way, where some people's expectations are for an early binding while the binding is, instead, late. For example (with a hypothetical GUI framework...):

for i in range(10):
    Button(text="Button #%s" % i,
           click=lambda: say("I'm #%s!" % i))

this will show ten buttons saying "Button #0", "Button #1", etc, but, when clicked, each and every one of them will say it's #9 -- because the i within the lambda is late bound (with a lexical closure). A fix is to take advantage of the fact that default values for argument are early-bound (as I pointed out about the first issue!-) and change the last line to

           click=lambda i=i: say("I'm #%s!" % i))

Now lambda's i is an argument with a default value, not a free variable (looked up by lexical closure) any more, and so the code works as intended (there are other ways too, of course).

Alex Martelli
the problem of mutable defaults is so common, it appears in all the python tutorials I have seen, and I have been scared straight by it. The rule of thumb you list regarding the "in-place" modification is a good one, and I think it will take me very far. The late-binding in the lambda expression you give is, however, very puzzling to me; this is reverse of what happens in matlab, and I am not sure I like it. them's the breaks, though.
shabbychef
@shabbychef, the value bound to a name is always looked up the moment it's needed -- no earlier (and of course no later;-). Free variables used in functions (lambda or otherwise) that are local variables in a lexically-containing "outer" function are no exception to this rule! A function's body (again, lambda or not) always executes when the function's called, no earlier (and, of course, no later) -- so clearly that's when the values of free variables are needed, so of course that's when they're looked up. What other behavior could be at all consistent?
Alex Martelli
oddly enough, the behaviour of matlab in this regard is what I have come to know and tolerate; one can define an anonymous function (essentially a `lambda`), which uses variables from the outer scope, and they are bound to the value when the anonymous function was created. in fact, one can save the anonymous function to file (essentially pickling it), close matlab, turn off the computer, come back, reboot, reload, and the anonymous function still works, still has variables from its context. I think Mathworks had to do this b/c their anonymous functions are so limited in usefulness otherwise...
shabbychef
While the behavior of anonymous functions is as you say in MATLAB, nested functions in MATLAB can have those same variables change as part of the invocation of the nested function. The reason the anonymous functions behave the way they do is so they can be saved, loaded, shared, etc. without any extra baggage. MATLAB had an inline object before where you had to pass along values for each of the variables and it confused folks and was often quite unwieldy to use.
Loren
The ability to save anonymous functions with frozen context is actually useful; I like functional programming, and am scared senseless of side effects, thus this question.
shabbychef