views:

554

answers:

7

Closures are an incredibly useful language feature. They let us do clever things that would otherwise take a lot of code, and often enable us to write code that is more elegant and more clear. In Python, closures are read-only affairs; that is, a function defined inside another lexical scope cannot change variables outside of its local scope. Can someone explain why that is? There have been situations in which I would like to create a closure that modifies variables in the outer scope, but it wasn't possible. I realize that in almost all cases (if not all of them), this behavior can be achieved with classes, but it is often not as clean or as elegant. Why can't I do it with a closure?

Here is an example of a read/write closure:

def counter():
    count = 0
    def c():
        count += 1
        return count
    return c

This is the current behavior when you call it:

>>> c()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in c
UnboundLocalError: local variable 'count' referenced before assignment

What I'd like it to do instead is this:

>>> c()
1
>>> c()
2
>>> c()
3
+11  A: 

nonlocal in 3.x should remedy this.

Ignacio Vazquez-Abrams
That's fantastic news. I will look into it. Does that mean that "nonlocal" signals the interpreter to create a thunk for this function? Do you know how this works?
Benson
I don't know all the details, but `nonlocal` should indicate to the compiler that it will need to walk the scopes in order to find the name.
Ignacio Vazquez-Abrams
Right. AFAIK, the only way for that to work is to essentially package up the whole scope of the encompassing function before it gets GC'd in a thunk.
Benson
+13  A: 

You could do this and it would work more or less the same way:

class counter(object):
    def __init__(self, count=0):
        self.count = count
    def __call__(self):
        self.count += 1
        return self.count    

Or, a bit of a hack:

def counter():
    count = [0]
    def incr(n):
        n[0] += 1
        return n[0]
    return lambda: incr(count)

I'd go with the first solution.

EDIT: That's what I get for not reading the big blog of text.

Anyway, the reason Python closures are rather limited is "because Guido felt like it." Python was designed in the early 90s, in the heyday of OO. Closures were rather low on the list of language features people wanted. As functional ideas like first class functions, closures, and other things make their way into mainstream popularity, languages like Python have had to tack them on, so their use may a bit awkward, because that's not what the language was designed for.

<rant on="Python scoping">

Also, Python (2.x) has rather odd (in my opinion) ideas about scoping that interferes with a sane implementation of closures, among other things. It always bothers me that this:

new = [x for x in old]

Leaves us with the name x defined in the scope we used it in, as it is (in my opinion) a conceptually smaller scope. (Though Python gets points for consistency, as doing the same thing with a for loop has the same behavior. The only way to avoid this is to use map.)

Anyway, </rant>

Chris Lutz
+1 for the hack
chrispy
All very good info, and I appreciate the insight. I also agree with your assessment of python's scoping. Still, the question was "why can't I?", not "how do I?". I want to know why the language was designed like this.
Benson
+9  A: 

I would use a generator:

>>> def counter():
    count = 0
    while True:
        count += 1
        yield(count)

>>> c = counter()
>>> c.next()
1
>>> c.next()
2
>>> c.next()
3

EDIT: I believe the ultimate answer to your question is PEP-3104:

In most languages that support nested scopes, code can refer to or rebind (assign to) any name in the nearest enclosing scope. Currently, Python code can refer to a name in any enclosing scope, but it can only rebind names in two scopes: the local scope (by simple assignment) or the module-global scope (using a global declaration).

This limitation has been raised many times on the Python-Dev mailing list and elsewhere, and has led to extended discussion and many proposals for ways to remove this limitation. This PEP summarizes the various alternatives that have been suggested, together with advantages and disadvantages that have been mentioned for each.

Before version 2.1, Python's treatment of scopes resembled that of standard C: within a file there were only two levels of scope, global and local. In C, this is a natural consequence of the fact that function definitions cannot be nested. But in Python, though functions are usually defined at the top level, a function definition can be executed anywhere. This gave Python the syntactic appearance of nested scoping without the semantics, and yielded inconsistencies that were surprising to some programmers -- for example, a recursive function that worked at the top level would cease to work when moved inside another function, because the recursive function's own name would no longer be visible in its body's scope. This violates the intuition that a function should behave consistently when placed in different contexts.

jbochi
+1 I was too focused on the desired usage syntax that I forgot about generators. Always good.
Chris Lutz
That's a good answer, but it's not an answer to the questions I asked. I want to know why python is designed like this, not how to work around it. I could do it with a class, with generators (which just instantiates a class, so it amounts to the same thing), or probably other ways. But I don't care about a counter instance, I want to know why the language was designed this way.
Benson
Python 2.x thinks you are declaring a new variable if it is not in the current scope. That's why Python 3.0 introduced the `nonlocal` keyword, to workaround this issue.
jbochi
Again, "python thinks you're declaring a new variable" is the what, not the why. I got that much from the error message. :-) Still, I appreciate the insight.
Benson
Well, Guido is a human and humans make mistakes sometimes!-P
jbochi
+1 That's the best answer I've seen so far.
Benson
+4  A: 

Functions can also have attributes, so this would work, too:

def counter():
    counter.count = 0
    def c():
        counter.count += 1
        return counter.count
    return c

However, in this specific example, I'd use a generator as suggested by jbochi.

As for why, I can't say for sure, but I imagine it's not an explicit design choice, but rather a remnant of Python's sometimes-odd scoping rules (and especially the somewhat-odd evolution of its scoping rules).

mipadi
+1 for a way to do it I didn't know before (and for agreeing on Python's inane scoping rules).
Chris Lutz
That's an interesting idea, but it's effectively just a class declaration. In fact, it's disturbingly similar to javascript style classes. But thanks for the insight; I didn't know you could add arbitrary attributes to functions.
Benson
Yeah, it's basically a class. And I've found that when you have an urge to use function attributes, you should either (a) use a generator (as in this example), or (b) use a class. *But* I thought it was worth pointing out that function attributes *do* exist and could be used as a solution to your problem (even though I agree that they're kind of ugly).
mipadi
+13  A: 

To expand on Ignacio's answer:

def counter():
    count = 0
    def c():
        nonlocal count
        count += 1
        return count
    return c

x = counter()
print([x(),x(),x()])

gives [1,2,3] in Python 3; invocations of counter() give independent counters. Other solutions - especially using itertools/yield are more idiomatic.

sdcvvc
I realize that this isn't the best example I could have used, it just seemed like the simplest example to write. This is exactly what I was looking for (along with a reason they didn't add 'nonlocal' years ago). The counter example was just a piece of throwaway code to get the point across. Thank you for making this explicit.
Benson
+2  A: 

This behavior is quite thoroughly explained the official Python tutorial as well as in the Python execution model. In particular, from the tutorial:

A special quirk of Python is that – if no global statement is in effect – assignments to names always go into the innermost scope.

However, this does not say anything about why it behaves in this way.

Some more information comes from PEP 3104, that tries to tackle this situation for Python 3.0.
There, you can see that it is this way because at a certain point in time, it was seen as the best solution instead of introducing classic static nested scopes (see Re: Scoping (was Re: Lambda binding solved?)).

That said, I have also my own interpretation.
Python implements namespaces as dictionaries; when a lookup for a variable fails in the inner, then it tries in the outer and so on, until it reaches the builtins.
However, binding a variable is a completely different stuff, because you need to specify a particular namespace - that it is always the innermost one (unless you set the "global" flag, that means it is always the global namespace).
Eventually, the different algorithms used for looking up and binding variables are the reason for closures to be read-only in Python.
But, again, this is just my speculation :-)

Roberto Liffredo
Interesting ideas. From my perspective the most logical thing to do when you have an lvalue that is being assigned to is to first look it up in the symbol table. If it exists, use it, if not, create it at the innermost scope. But maybe that's just me...?
Benson
I fully agree with you, and I have been biten several times by a similar assumption in Python. For this reason, I found very useful to reason in terms of dictionaries: as soon as you see memory model in that way, things become much clearer.
Roberto Liffredo
A: 

It is not that they are read-only, as much as the scope is more strict that you realize. If you can't nonlocal in Python 3+, then you can at least use explicit scoping. Python 2.6.1, with explicit scoping at the module level:

>>> def counter():
...     sys.modules[__name__].count = 0
...     def c():
...         sys.modules[__name__].count += 1
...         return sys.modules[__name__].count
...     sys.modules[__name__].c = c
...     
>>> counter()
>>> c()
1
>>> c()
2
>>> c()
3

A little more work is required to have a more restricted scope for the count variable, instead of using a pseudo-global module variable (still Python 2.6.1):

>>> def counter():
...     class c():
...         def __init__(self):
...             self.count = 0
...     cinstance = c()
...     def iter():
...         cinstance.count += 1
...         return cinstance.count
...     return iter
... 
>>> c = counter()
>>> c()
1
>>> c()
2
>>> c()
3
>>> d = counter()
>>> d()
1
>>> c()
4
>>> d()
2
cjrh
It turns out the inability to affect things in non-local non-global scope is the same as being read-only. Explicit scoping at the module level just gives you another kind of global, it doesn't give you a new instance for each execution of the counter() function.
Benson
Conceded. Please see my new additional example above. I am trying to get out what you said you would like to achieve in your original question.
cjrh
It occurs to me now that my new example, using the internal `cinstance` to hold the state of `count` is almost identical to your first example in the question, except that `count` is now maintained inside an instance. There is a separate instance for each invocation of `counter()`, which is what you wanted originally. What puzzles me is why the internal instance variable `cinstance` is handled different with respect to scoping than the original `count` in your first example; as you can test yourself, the code above does *not* produce `UnboundLocalError` on `cinstance`.
cjrh