tags:

views:

148

answers:

2

In a big application I am working, several people import same modules differently e.g. import x or from y import x the side effects of that is x is imported twice and may introduce very subtle bugs, if someone is relying on global attributes

e.g. suppose I have a package mypakcage with three file mymodule.py, main.py and init.py

mymodule.py contents

l = []
class A(object): pass

main.py contents

def add(x):
    from mypackage import mymodule
    mymodule.l.append(x)
    print "updated list",mymodule.l

def get():
    import mymodule
    return mymodule.l

add(1)
print "lets check",get()

add(1)
print "lets check again",get()

it prints

updated list [1]
lets check []
updated list [1, 1]
lets check again []

because now there are two lists in two different modules, similarly class A is different To me it looks serious enough because classes itself will be treated differently e.g. below code prints False

def create():
    from mypackage import mymodule
    return mymodule.A()

def check(a):
    import mymodule
    return isinstance(a, mymodule.A)

print check(create())

Question:

Is there any way to avoid this? except enforcing that module should be imported one way onyl. Can't this be handled by python import mechanism, I have seen several bugs related to this in django code and elsewhere too.

+2  A: 

I can only replicate this if main.py is the file you are actually running. In that case you will get the current directory of main.py on the sys path. But you apparently also have a system path set so that mypackage can be imported.

Python will in that situation not realize that mymodule and mypackage.mymodule is the same module, and you get this effect. This change illustrates this:

def add(x):
    from mypackage import mymodule
    print "mypackage.mymodule path", mymodule
    mymodule.l.append(x)
    print "updated list",mymodule.l

def get():
    import mymodule
    print "mymodule path", mymodule
    return mymodule.l

add(1)
print "lets check",get()

add(1)
print "lets check again",get()


$ export PYTHONPATH=.
$ python  mypackage/main.py 

mypackage.mymodule path <module 'mypackage.mymodule' from '/tmp/mypackage/mymodule.pyc'>
mymodule path <module 'mymodule' from '/tmp/mypackage/mymodule.pyc'>

But add another mainfile, in the currect directory:

realmain.py:
from mypackage import main

and the result is different:

mypackage.mymodule path <module 'mypackage.mymodule' from '/tmp/mypackage/mymodule.pyc'>
mymodule path <module 'mypackage.mymodule' from '/tmp/mypackage/mymodule.pyc'>

So I suspect that you have your main python file within the package. And in that case the solution is to not do that. :-)

Lennart Regebro
hmmm actually that is not the case, I will see how to replicate it with main.py outside package
Anurag Uniyal
no you are right :)
Anurag Uniyal
+1  A: 

Each module namespace is imported only once. Issue is, you're importing them differently. On the first you're importing from the global package, and on the second you're doing a local, non-packaged import. Python sees modules as different. The first import is internally cached as mypackage.mymodule and the second one as mymodule only.

A way to solve this is to always use absolute imports. That is, always give your module absolute import paths from the top-level package onwards:

def add(x):
    from mypackage import mymodule
    mymodule.l.append(x)
    print "updated list",mymodule.l

def get():
    from mypackage import mymodule
    return mymodule.l

Remember that your entry point (the file you run, main.py) also should be outside the package. When you want the entry point code to be inside the package, usually you use a run a small script instead. Example:

runme.py, outside the package:

from mypackage.main import main
main()

And in main.py you add:

def main():
    # your code

I find this document by Jp Calderone to be a great tip on how to (not) structure your python project. Following it you won't have issues. Pay attention to the bin folder - it is outside the package. I'll reproduce the entire text here:

Filesystem structure of a Python project

Do:

  • name the directory something related to your project. For example, if your project is named "Twisted", name the top-level directory for its source files Twisted. When you do releases, you should include a version number suffix: Twisted-2.5.
  • create a directory Twisted/bin and put your executables there, if you have any. Don't give them a .py extension, even if they are Python source files. Don't put any code in them except an import of and call to a main function defined somewhere else in your projects.
  • If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example, Twisted/twisted.py. If you need multiple source files, create a package instead (Twisted/twisted/, with an empty Twisted/twisted/__init__.py) and place your source files in it. For example, Twisted/twisted/internet.py.
  • put your unit tests in a sub-package of your package (note - this means that the single Python source file option above was a trick - you always need at least one other file for your unit tests). For example, Twisted/twisted/test/. Of course, make it a package with Twisted/twisted/test/__init__.py. Place tests in files like Twisted/twisted/test/test_internet.py.
  • add Twisted/README and Twisted/setup.py to explain and install your software, respectively, if you're feeling nice.

Don't:

  • put your source in a directory called src or lib. This makes it hard to run without installing.
  • put your tests outside of your Python package. This makes it hard to run the tests against an installed version.
  • create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module instead of a package, it's simpler.
  • try to come up with magical hacks to make Python able to import your module or package without having the user add the directory containing it to their import path (either via PYTHONPATH or some other mechanism). You will not correctly handle all cases and users will get angry at you when your software doesn't work in their environment.
nosklo