views:

3945

answers:

7

Hi,

I've been programming for many years, and recently started learning Python. The following code works as expected in both python 2.5 and 3.0 (on OS X if that matters):

a, b, c = (1, 2, 3)

print(a, b, c)

def test():
    print(a)
    print(b)
    print(c)    # (A)
    #c+=1       # (B)
test()

However, when I uncomment line (B), I get an UnboundLocalError: 'c' not assigned at line (A). The values of a and b are printed correctly. This has me completely baffled for two reasons:

1) Why is there an runtime error thrown at line (A) because of a later statement on line (B)?

2) Why are variables a and b printed as expected, while c raises an error?

The only explanation I can come up with is that a local variable c is created by the assignment c+=1, which takes precedent over the "global" variable c even before the local variable is created. Of course, it doesn't make sense for a variable to "steal" scope before it exists.

Could someone please explain this behavior? Thank you very much, brainfsck

+27  A: 

Python treats variables in functions differently depending on whether you assign values to them from within the function or not. If you assign any value to a variable, it is treated by default as a local variable. Therefore, when you uncomment the line, you are attempting to reference a local variable before any value has been assigned to it.

If you want the variable c to refer to the global c put

global c

as the first line of the function.

As of python 3, there is now

nonlocal c

that you can use to refer to the nearest enclosing (not necessarily global) scope.

recursive
Thanks. Quick question. Does this imply that Python decides the scope of each variable before running a program? Before running a function?
brainfsck
The variable scope decision is made by the compiler, which normally runs once when you first start the program. However it is worth keeping in mind that the compiler might also run later if you have "eval" or "exec" statements in your program.
Greg Hewgill
Okay thank you. I guess "interpreted language" doesn't imply quite as much as I had thought.
brainfsck
Ah that 'nonlocal' keyword was exactly what I was looking for, it seemed Python was missing this. Presumably this 'cascades' through each enclosing scope that imports the variable using this keyword?
Brendan
+3  A: 

Python has rather interesting behavior when you try traditional global variable semantics. I don't remember the details, but you can read the value of a variable declared in 'global' scope just fine, if you want to modify it, you have to use the global keyword. Try changing test() to this:

def test():
    global c
    print(a)
    print(b)
    print(c)    # (A)
    c+=1       # (B)

Also, the reason you are getting this error is because you can also declare a new variable inside that function with the same name as a 'global' one, and it would be completely separate. The interpreter thinks you are trying to make a new variable in this scope called 'c' and modify it all in one operation, which isn't allowed in python because this new 'c' wasn't initialized.

Mongoose
Thanks for your response, but I don't think it explains why the error is thrown at line (A), where I'm merely trying to print a variable. The program never gets to line (B) where it is trying to modify an un-initialized variable.
brainfsck
Python will read, parse and turn the whole function into internal bytecode before it starts running the program, so the fact that the "turn c to local variable" happens textually after the printing of the value doesn't, as it were, matter.
Vatine
+13  A: 

okay, here's the deal. Python is a little weird, in that it keeps everything in a dictionary for the various scopes. The original a,b,c are in the uppermost scope and so in that uppermost dictionary. The function has its own dictionary. When you reach the print(a) and print(b) statements, there's nothing by that name in the dictionary, so Python looks up the list and finds them in the clobal dictionary.

Now we get to c+=1, which is, of course, equivalent to c=c+1. When Python scans that line, it says "ahah, there's a variable named c, I'll put it into my local scope dictionary." Then when it goes looking for a value for c for the c on the right hand side of the assignment, it finds its local variable named c, which has no value yet, and so throws the error.

The statement global c mentioned above simply tells the parser that it uses the c from the global scope and so doesn't need a new one.

The reason it says there's an issue on the line it does is because it is effectively looking for the names before it tries to generate code, and so in some sense doesn't think it's really doing that line yet. I'd argue that is a useability bug, but it's generally a good practice to just learn not to take a compiler's messages too seriously.

If it's any comfort, I spent probably a day digging and experimenting with this same issue before I found something Guido had written about the dictionaries that Explained Everything.

Update, see comments:

It doesn't scan the code twice, but it does scan the code in two phases, lexing and parsing.

Consider how the parse of this cline of code works. The lexer reads the source text and breaks it into lexemes, the "smallest components" of the grammar. So when it hits the line

c+=1

it breaks it up into something like

SYMBOL(c) OPERATOR(+=) DIGIT(1)

The parser eventually wants to make this into a parse tree and execute it, but since it's an assignment, before it does, it looks for the name c in the local dictionary, doesn't see it, and inserts it in the dictionary, marking it as uninitialized. In a fullly compiled language, it would just go into the symbol table and wait for the parse, but since it WON'T have the luxury of a second pass, the lexer does a little extra work to make life easier later on. Only, then it sees the OPERATOR, sees that the rules say "if you have an operator += the left hand side must have been initialized" and says "whoops!"

The point here is that it hasn't really started the parse of the line yet. This is all happening sort of preparatory to the actual parse, so the line counter hasn't advanced to the next line. Thus when it signals the error, it still thinks its on the previous line.

As I say, you could argue it's a usability bug, but its actually a fairly common thing. Some compilers are more honest about it and say "error on or around line XXX", but this one doesn't.

Charlie Martin
Okay thank you for your response; it cleared some things up for me about scopes in python. However, I still don't understand why the error is raised at line (A) rather than line (B). Does Python create its variable scope dictionary BEFORE running the program?
brainfsck
No, it's on the expression level. I'll add to the answer, I don't think I can fit this in a comment.
Charlie Martin
A: 

The Python interpreter will read a function as a complete unit. I think of it as reading it in two passes, once to gather its closure (the local variables), then again to turn it into byte-code.

As I'm sure you were already aware, any name used on the left of a '=' is implicitly a local variable. More than once I've been caught out by changing a variable access to a += and it's suddenly a different variable.

I also wanted to point out it's not really anything to do with global scope specifically. You get the same behaviour with nested functions.

James Hopkin
+10  A: 

Taking a look at the disassembly may clarify what is happening:

>>> def f():
...    print a
...    print b
...    a = 1

>>> import dis
>>> dis.dis(f)

  2           0 LOAD_FAST                0 (a)
              3 PRINT_ITEM
              4 PRINT_NEWLINE

  3           5 LOAD_GLOBAL              0 (b)
              8 PRINT_ITEM
              9 PRINT_NEWLINE

  4          10 LOAD_CONST               1 (1)
             13 STORE_FAST               0 (a)
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE

As you can see, the bytecode for accessing a is LOAD_FAST, and for b, LOAD_GLOBAL. This is because the compiler has identified that a is assigned to within the function, and classified it as a local variable. The access mechanism for locals is fundamentally different for globals - they are statically assigned an offset in the frame's variables table, meaning lookup is a quick index, rather than the more expensive dict lookup as for globals. Because of this, Python is reading the print a line as "get the value of local variable 'a' held in slot 0, and print it", and when it detects that this variable is still uninitialised, raises an exception.

Brian
I'd not seen the dis module before. Interesting stuff - thanks for that :-)
Jon Cage
Ditto. I've got to look at that.
Charlie Martin
+1  A: 

This is not a direct answer to your question, but it is closely related, as it's another gotcha caused by the relationship between augmented assignment and function scopes.

In most cases, you tend to think of augmented assignment (a += b) as exactly equivalent to simple assignment (a = a + b). It is possible to get into some trouble with this though, in one corner case. Let me explain:

The way python's simple assignment works means that if a is passed into a function (like func(a); note that python is always pass-by-reference), then a = a + b will not modify the a that is passed in. Instead, it will just modify the local pointer to a.

But if you use a += b, then it is sometimes implemented as:

a = a + b

or sometimes (if the method exists) as:

a.__iadd__(b)

In the first case (as long as a is not declared global), there are no side-effects outside local scope, as the assignment to "a" is just a pointer update.

In the second case, a will actually modify itself, so all references to a will point to the modified version. This is demonstrated by the following code:

def copy_on_write(a):
      a = a + a
def inplace_add(a):
      a += a
a = [1]
copy_on_write(a)
print a # [1]
inplace_add(a)
print a # [1, 1]
b = 1
copy_on_write(b)
print b # [1]
inplace_add(b)
print b # 1

So the trick is to avoid augmented assignment on function arguments (I try to only use it for local/loop variables). Use simple assignment, and you will be safe from ambiguous behaviour.

alsuren
+1  A: 

Here are two links that may help

1: docs.python.org/3.1/faq/programming.html?highlight=nonlocal#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value

2: docs.python.org/3.1/faq/programming.html?highlight=nonlocal#how-do-i-write-a-function-with-output-parameters-call-by-reference

link one describes the error UnboundLocalError. Link two can help with with re-writing your test function. Based on link two, the original problem could be rewritten as:

>>> a, b, c = (1, 2, 3)
>>> print (a, b, c)
(1, 2, 3)
>>> def test (a, b, c):
...     print (a)
...     print (b)
...     print (c)
...     c += 1
...     return a, b, c
...
>>> a, b, c = test (a, b, c)
1
2
3
>>> print (a, b ,c)
(1, 2, 4)
mcdon