views:

106

answers:

2

In Python, if you want to programmatically import a module, you can do:

module = __import__('module_name')

If you want to import a submodule, you would think it would be a simple matter of:

module = __import__('module_name.submodule')

Of course, this doesn't work; you just get module_name again. You have to do:

module = __import__('module_name.submodule', fromlist=['blah'])

Why? The actual value of fromlist don't seem to matter at all, as long as it's non-empty. What is the point of requiring an argument, then ignoring its values?

Most stuff in Python seems to be done for good reason, but for the life of me, I can't come up with any reasonable explanation for this behavior to exist.

+1  A: 

The answer can found found in the documentation for __import__:

The fromlist should be a list of names to emulate from name import ..., or an empty list to emulate import name.

When importing a module from a package, note that __import__('A.B', ...) returns package A when fromlist is empty, but its submodule B when fromlist is not empty.

So basically, that's just how the implementation of __import__ works: if you want the submodule, you pass a fromlist containing something you want to import from the submodule, and the implementation if __import__ is such that the submodule is returned.

Further explanation

I think the semantics exist so that the most relevant module is returned. In other words, say I have a package foo containing module bar with function baz. If I:

import foo.bar

Then I refer to baz as

foo.bar.baz()

This is like __import__("foo.bar", fromlist=[]).

If instead I import with:

from foo import bar

Then I refer to baz as bar.baz()

Which would be similar to __imoort__("foo.bar", fromlist=["something"]).

If I do:

from foo.bar import baz

Then I refer to baz as

baz()

Which is like __import__("foo.bar", fromlist=["baz"]).

So in the first case, I'd have to use the fully-qualified name, hence __import__ returns the first module name you'd use to refer to the imported elements, that being foo. In the last case, bar is the most specific module containing the imported elements, so it makes sense that __import__ would return the foo.bar module.

The second case is a little weird, but I am guessing it was written that way to support importing a module using the from <package> import <module> syntax, and in that case bar is still the most specific module to return.

mipadi
Saying "that's just how the implementation works" doesn't answer my question. Why does it work that way?Saying "to emulate the from name import …" form is closer, but under what circumstances would you need that? The fromlist doesn't make a whit of difference to how __import__ actually works, so I don't see where there's a case where you'd need to pass it to emulate anything, except what should be the obvious behavior of the function.
ieure
You're right, it is begging the question. I updated my answer to give a more relevant reply.
mipadi
+7  A: 

In fact, the behaviour of __import__() is entirely because of the implementation of the import statement, which calls __import__(). There's basically five slightly different ways __import__() can be called by import (with two main categories):

import pkg
import pkg.mod
from pkg import mod, mod2
from pkg.mod import func, func2
from pkg.mod import submod

In the first and the second case, the import statement should assign the "left-most" module object to the "left-most" name: pkg. After import pkg.mod you can do pkg.mod.func() because the import statement introduced the local name pkg, which is a module object that has a mod attribute. So, the __import__() function has to return the "left-most" module object so it can be assigned to pkg. Those two import statements thus translate into:

pkg = __import__('pkg')
pkg = __import__('pkg.mod')

In the third, fourth and fifth case, the import statement has to do more work: it has to assign to (potentially) multiple names, which it has to get from the module object. The __import__() function can only return one object, and there's no real reason to make it retrieve each of those names from the module object (and it would make the implementation a lot more complicated.) So the simple approach would be something like (for the third case):

tmp = __import__('pkg')
mod = tmp.mod
mod2 = tmp.mod2

However, that won't work if pkg is a package and mod or mod2 are modules in that package that are not already imported, as they are in the third and fifth case. The __import__() function needs to know that mod and mod2 are names that the import statement will want to have accessible, so that it can see if they are modules and try to import them too. So the call is closer to:

tmp = __import__('pkg', fromlist=['mod', 'mod2'])
mod = tmp.mod
mod2 = tmp.mod2

which causes __import__() to try and load pkg.mod and pkg.mod2 as well as pkg (but if mod or mod2 don't exist, it's not an error in the __import__() call; producing an error is left to the import statement.) But that still isn't the right thing for the fourth and fifth example, because if the call were so:

tmp = __import__('pkg.mod', fromlist=['submod'])
submod = tmp.submod

then tmp would end up being pkg, as before, and not the pkg.mod module you want to get the submod attribute from. The implementation could have decided to make it so the import statement does extra work, splitting the package name on . like the __import__() function already does and traversing the names, but this would have meant duplicating some of the effort. So, instead, the implementation made __import__() return the right-most module instead of the left-most one if and only if fromlist is passed and not empty.

(The import pkg as p and from pkg import mod as m syntax doesn't change anything about this story except which local names get assigned to -- the __import__() function sees nothing different when as is used, it all remains in the import statement implementation.)

Thomas Wouters
Thank you very much for your detailed explanation.
ieure