tags:

views:

354

answers:

3

Hi All,

I am writing a program that categorizes a list of Python files by which modules they import. As such I need to scan the collection of .py files ad return a list of which modules they import. As an example, if one of the files I import has the following lines:

import os
import sys, gtk

I would like it to return:

["os", "sys", "gtk"]

I played with modulefinder and wrote:

from modulefinder import ModuleFinder

finder = ModuleFinder()
finder.run_script('testscript.py')

print 'Loaded modules:'
for name, mod in finder.modules.iteritems():
    print '%s ' % name,

but this returns more than just the modules used in the script. As an example in a script which merely has:

import os
print os.getenv('USERNAME')

The modules returned from the ModuleFinder script return:

tokenize  heapq  __future__  copy_reg  sre_compile  _collections  cStringIO  _sre  functools  random  cPickle  __builtin__  subprocess  cmd  gc  __main__  operator  array  select  _heapq  _threading_local  abc  _bisect  posixpath  _random  os2emxpath  tempfile  errno  pprint  binascii  token  sre_constants  re  _abcoll  collections  ntpath  threading  opcode  _struct  _warnings  math  shlex  fcntl  genericpath  stat  string  warnings  UserDict  inspect  repr  struct  sys  pwd  imp  getopt  readline  copy  bdb  types  strop  _functools  keyword  thread  StringIO  bisect  pickle  signal  traceback  difflib  marshal  linecache  itertools  dummy_thread  posix  doctest  unittest  time  sre_parse  os  pdb  dis

...whereas I just want it to return 'os', as that was the module used in the script.

Can anyone help me achieve this?

UPDATE: I just want to clarify that I would like to do this without running the Python file being analyzed, and just scanning the code.

+4  A: 

IMO the best way todo this is to use the http://furius.ca/snakefood/ package. The author has done all of the required work to get not only directly imported modules but it uses the AST to parse the code for runtime dependencies that a more static analysis would miss.

Worked up a command example to demonstrate:

sfood ./example.py | sfood-cluster > example.deps

That will generate a basic dependency file of each unique module. For even more detail use:

sfood -r -i ./example.py | sfood-cluster > example.deps

To walk a tree and find all imports, you can also do this in code: Please NOTE - The AST chunks of this routine were lifted from the snakefood source which has this copyright: Copyright (C) 2001-2007 Martin Blais. All Rights Reserved.

 import os
 import compiler
 from compiler.ast import Discard, Const
 from compiler.visitor import ASTVisitor

 def pyfiles(startPath):
     r = []
     d = os.path.abspath(startPath)
     if os.path.exists(d) and os.path.isdir(d):
         for root, dirs, files in os.walk(d):
             for f in files:
                 n, ext = os.path.splitext(f)
                 if ext == '.py':
                     r.append([d, f])
     return r

 class ImportVisitor(object):
     def __init__(self):
         self.modules = []
         self.recent = []
     def visitImport(self, node):
         self.accept_imports()
         self.recent.extend((x[0], None, x[1] or x[0], node.lineno, 0)
                            for x in node.names)
     def visitFrom(self, node):
         self.accept_imports()
         modname = node.modname
         if modname == '__future__':
             return # Ignore these.
         for name, as_ in node.names:
             if name == '*':
                 # We really don't know...
                 mod = (modname, None, None, node.lineno, node.level)
             else:
                 mod = (modname, name, as_ or name, node.lineno, node.level)
             self.recent.append(mod)
     def default(self, node):
         pragma = None
         if self.recent:
             if isinstance(node, Discard):
                 children = node.getChildren()
                 if len(children) == 1 and isinstance(children[0], Const):
                     const_node = children[0]
                     pragma = const_node.value
         self.accept_imports(pragma)
     def accept_imports(self, pragma=None):
         self.modules.extend((m, r, l, n, lvl, pragma)
                             for (m, r, l, n, lvl) in self.recent)
         self.recent = []
     def finalize(self):
         self.accept_imports()
         return self.modules

 class ImportWalker(ASTVisitor):
     def __init__(self, visitor):
         ASTVisitor.__init__(self)
         self._visitor = visitor
     def default(self, node, *args):
         self._visitor.default(node)
         ASTVisitor.default(self, node, *args) 

 def parse_python_source(fn):
     contents = open(fn, 'rU').read()
     ast = compiler.parse(contents)
     vis = ImportVisitor() 

     compiler.walk(ast, vis, ImportWalker(vis))
     return vis.finalize()

 for d, f in pyfiles('/Users/bear/temp/foobar'):
     print d, f
     print parse_python_source(os.path.join(d, f)) 
bear
This seems to be a third party tool for generated graphs, but not something I can use in my code other than using it with subprocess. Is this what you recommend?
Jono Bacon
ah - your looking for something really basic to include - let me chew on that
bear
of course it would help if *run* my samples - working on an actual working version :/
bear
ok, modified my example to use the AST walking code from snakefood and wrapped it in some basic file walking code
bear
+2  A: 

Well, you could always write a simple script that searches the file for import statements. This one finds all imported modules and files, including those imported in functions or classes:

def find_imports(toCheck):
    """
    Given a filename, returns a list of modules imported by the program.
    Only modules that can be imported from the current directory
    will be included. This program does not run the code, so import statements
    in if/else or try/except blocks will always be included.
    """
    import imp
    importedItems = []
    with open(toCheck, 'r') as pyFile:
        for line in pyFile:
            # ignore comments
            line = line.strip().partition("#")[0].partition("as")[0].split(' ')
            if line[0] == "import":
                for imported in line[1:]:
                    # remove commas (this doesn't check for commas if
                    # they're supposed to be there!
                    imported = imported.strip(", ")
                    try:
                        # check to see if the module can be imported
                        # (doesn't actually import - just finds it if it exists)
                        imp.find_module(imported)
                        # add to the list of items we imported
                        importedItems.append(imported)
                    except ImportError:
                        # ignore items that can't be imported
                        # (unless that isn't what you want?)
                        pass

    return importedItems

toCheck = raw_input("Which file should be checked: ")
print find_imports(toCheck)

This doesn't do anything for from module import something style imports, though that could easily be added, depending on how you want to deal with those. It also doesn't do any syntax checking, so if you have some funny business like import sys gtk, os it will think you've imported all three modules even though the line is an error. It also doesn't deal with try/except type statements with regards to import - if it could be imported, this function will list it. It also doesn't deal well with multiple imports per line if you use the as keyword. The real issue here is that I'd have to write a full parser to really do this correctly. The given code works in many cases, as long as you understand there are definite corner cases.

One issue is that relative imports will fail if this script isn't in the same directory as the given file. You may want to add the directory of the given script to sys.path.

Daniel G
+1  A: 

It depends how thorough you want to me. Used modules is a turing complete problem: some python code uses lazy importing to only import things they actually use on a particular run, some generate things to import dynamically (e.g. plugin systems).

python -v will trace import statements - its arguably the simplest thing to check.