views:

1504

answers:

4

I ran across this case of UnboundLocalError recently, which seems strange:

import pprint

def main():
    if 'pprint' in globals(): print 'pprint is in globals()'
    pprint.pprint('Spam')
    from pprint import pprint
    pprint('Eggs')

if __name__ == '__main__': main()

Which produces:

pprint is in globals()
Traceback (most recent call last):
  File "weird.py", line 9, in <module>
    if __name__ == '__main__': main()
  File "weird.py", line 5, in main
    pprint.pprint('Spam')
UnboundLocalError: local variable 'pprint' referenced before assignment

pprint is clearly bound in globals, and is going to be bound in locals in the following statement. Can someone offer an explanation of why it isn't happy resolving pprint to the binding in globals here?

Edit: Thanks to the good responses I can clarify my question with relevant terminology:

At compile time the identifier pprint is marked as local to the frame. Does the execution model have no distinction where within the frame the local identifier is bound? Can it say, "refer to the global binding up until this bytecode instruction, at which point it has been rebound to a local binding," or does the execution model not account for this?

+3  A: 

Looks like Python sees the from pprint import pprint line and marks pprint as a name local to main() before executing any code. Since Python thinks pprint ought to be a local variable, referencing it with pprint.pprint() before "assigning" it with the from..import statement, it throws that error.

That's as much sense as I can make of that.

The moral, of course, is to always put those import statements at the top of the scope.

Triptych
So the conclusion that we'd draw from this analysis is that global identifiers cannot be used earlier in a local scope if it is bound later in the local scope?
cdleary
+1: Parts of scope resolution are resolved at compile time.
S.Lott
@cdleary: If within the same method you are using a name from global scope first and then for local scope, there is no implicit way the compiler can determine that the first occurrence is referring to the global namespace and the second to local. Hence its treated as UnboundLocalError.
JV
+3  A: 

Well, that was interesting enough for me to experiment a bit and I read through http://docs.python.org/reference/executionmodel.html

Then did some tinkering with your code here and there, this is what i could find:

code:

import pprint

def two():
    from pprint import pprint
    print globals()['pprint']
    pprint('Eggs')
    print globals()['pprint']

def main():
    if 'pprint' in globals():
        print 'pprint is in globals()'
    global  pprint
    print globals()['pprint']
    pprint.pprint('Spam')
    from pprint import pprint
    print globals()['pprint']
    pprint('Eggs')

def three():
    print globals()['pprint']
    pprint.pprint('Spam')

if __name__ == '__main__':
    two()
    print('\n')
    three()
    print('\n')
    main()

output:

<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Eggs'
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>

<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Spam'

pprint is in globals()
<module 'pprint' from '/usr/lib/python2.5/pprint.pyc'>
'Spam'
<function pprint at 0xb7d596f4>
'Eggs'

In the method two() from pprint import pprint but does not override the name pprint in globals, since the global keyword is not used in the scope of two().

In method three() since there is no declaration of pprint name in local scope it defaults to the global name pprint which is a module

Whereas in main(), at first the keyword global is used so all references to pprint in the scope of method main() will refer to the global name pprint. Which as we can see is a module at first and is overriden in the global namespace with a method as we do the from pprint import pprint

Though this may not be answering the question as such, but nevertheless its some interesting fact I think.

=====================

Edit Another interesting thing.

If you have a module say:

mod1

from datetime import    datetime

def foo():
    print "bar"

and another method say:

mod2

import  datetime
from mod1 import *

if __name__ == '__main__':
    print datetime.datetime.now()

which at first sight is seemingly correct since you have imported the module datetime in mod2.

now if you try to run mod2 as a script it will throw an error:

Traceback (most recent call last):
  File "mod2.py", line 5, in <module>
    print datetime.datetime.now()
AttributeError: type object 'datetime.datetime' has no attribute 'datetime'

because the second import from mod2 import * has overriden the name datetime in the namespace, hence the first import datetime is not valid anymore.

Moral: Thus the order of imports, the nature of imports (from x import *) and the awareness of imports within imported modules - matters.

JV
+1 for the great link (and thanks for your comment in @Triptych's answer -- it helped me clarify the question).
cdleary
+4  A: 

Where's the surprise? Any variable global to a scope that you reassign within that scope is marked local to that scope by the compiler.

If imports would be handled differently, that would be surprising imho.

It may make a case for not naming modules after symbols used therein, or vice versa, though.

Albert Visser
You are definitely right - it is the same behavior with _any_ variable reassigned in a scope. +1
Roberto Liffredo
A: 

This question got answered several weeks ago, but I think I can clarify the answers a little. First some facts.

1: In Python,

import foo

is almost exactly the same as

foo = __import__("foo", globals(), locals(), [], -1)

2: When executing code in a function, if Python encounters a variable that hasn't been defined in the function yet, it looks in the global scope.

3: Python has an optimization it uses for functions called "locals". When Python tokenizes a function, it keeps track of all the variables you assign to. It assigns each of these variables a number from a local monotonically increasing integer. When Python runs the function, it creates an array with as many slots as there are local variables, and it assigns each slot a special value that means "has not been assigned to yet", and that's where the values for those variables are stored. If you reference a local that hasn't been assigned to yet, Python sees that special value and throws an UnboundLocalValue exception.

The stage is now set. Your "from pprint import pprint" is really a form of assignment. So Python creates a local variable called "pprint" which occludes the global variable. Then, when you refer to "pprint.pprint" in the function, you hit the special value and Python throws the exception. If you didn't have that import statement in the function, Python would use the normal look-in-locals-first-then-look-in-globals resolution and find the pprint module in globals.

To disambiguate this you can use the "global" keyword. Of course by now you've already worked past your problem, and I don't know whether you really needed "global" or if some other approach was called for.

Larry Hastings