views:

114

answers:

3

My project targets a low-cost and low-resource embedded device. I am dependent on a relatively large and sprawling Python code base, of which my use of its APIs is quite specific.

I am keen to prune the code of this library back to its bare minimum, by executing my test suite within a coverage tools like Ned Batchelder's coverage or figleaf, then scripting removal of unused code within the various modules/files. This will help not only with understanding the libraries' internals, but also make writing any patches easier. Ned actually refers to the use of coverage tools to "reverse engineer" complex code in one of his online talks.

My question to the SO community is whether people have experience of using coverage tools in this way that they wouldn't mind sharing? What are the pitfalls if any? Is the coverage tool a good choice? Or would I be better off investing my time with figleaf?

The end-game is to be able to automatically generate a new source tree for the library, based on the original tree, but only including the code actually used when I run nosetests.

If anyone has developed a tool that does a similar job for their Python applications and libraries, it would be terrific to get a baseline from which to start development.

Hopefully my description makes sense to readers...

A: 

I haven't used coverage for pruning out, but it seems like it should do well. I've used the combination of nosetests + coverage, and it worked better for me than figleaf. In particular, I found the html report from nosetests+coverage to be helpful -- this should be helpful to you in understanding where the unused portions of the library are.

bstpierre
Thanks for the feedback concerning figleaf. I shall give both a shot, but focus my initial attention on coverage.
landstatic
+5  A: 

What you want isn't "test coverage", it is the transitive closure of "can call" from the root of the computation. (In threaded applications, you have to include "can fork").

You want to designate some small set (perhaps only 1) of functions that make up the entry points of your application, and want to trace through all possible callees (conditional or unconditional) of that small set. This is the set of functions you must have.

Python makes this very hard in general (IIRC, I'm not a deep Python expert) because of dynamic dispatch and especially due to "eval". Reasoning about what function can get called can be pretty tricky for a static analyzers applied to highly dynamic languages.

One might use test coverage as a way to seed the "can call" relation with specific "did call" facts; that could catch a lot of dynamic dispatches (dependent on your test suite coverage). Then the result you want is the transitive closure of "can or did" call. This can still be erroneous, but is likely to be less so.

Once you get a set of "necessary" functions, the next problem will be removing the unnecessary functions from the source files you have. If the number of files you start with is large, the manual effort to remove the dead stuff may be pretty high. Worse, you're likely to revise your application, and then the answer as to what to keep changes. So for every change (release), you need to reliably recompute this answer.

My company builds a tool that does this analysis for Java packages (with appropriate caveats regarding dynamic loads and reflection): the input is a set of Java files and (as above) a designated set of root functions. The tool computes the call graph, and also finds all dead member variables and produces two outputs: a) the list of purportedly dead methods and members, and b) a revised set of files with all the "dead" stuff removed. If you believe a), then you use b). If you think a) is wrong, then you add elements listed in a) to the set of roots and repeat the analysis until you think a) is right. To do this, you need a static analysis tool that parse Java, compute the call graph, and then revise the code modules to remove the dead entries. The basic idea applies to any language.

You'd need a similar tool for Python, I'd expect.

Maybe you can stick to just dropping files that are completely unused, although that may still be a lot of work.

Ira Baxter
+1 good analysis of the problem.
bstpierre
In theory, for my application the unittests will be a very close approximation of the transitive closure you refer - although as you correctly point out, I can never be absolutely sure without a more rigorous analysis, with additional consideration of the exit possibilities (Exceptions, finally-blocks etc...). Being able to prune entire branches, with orthogonal purpose from the core bits I am using would be a useful filter. There are clearly risks with this that I need to consider more deeply.
landstatic
Even if I don't deploy the pruned tree, it will certainly focus my attention when it comes to debugging logic within the library or developing patches - which is a secondary goal.
landstatic
Using Python within an embedded context, I could also do something a little more radical; adding a top-level try-catch clause, that handles any Exception not explicitly caught elsewhere, and arising due to a dodgy prune. Then, even though the pruned code would run 99% of the time, in the event it breaks due to a bad prune, it would fail gracefully and execute a recovery heuristic - e.g. restarting. Just a thought, needless to say I won't be rushing into deploying this code.
landstatic
@landstatic: If the missing function is needed functionality, you can't recover gracefully. You didn't describe your embedded device; most of the ones I've experienced in industry are *ship-and-forget*. In particular, you can't backpatch them later, so your answer would be unacceptable.
Ira Baxter
@Ira our device is an Internet-enabled STB with upgradeable firmware so the conditions of use are not so restrictive. Thanks for your observations. Very interesting.
landstatic
+1  A: 

As others have pointed out, coverage can tell you what code has been executed. The trick for you is to be sure that your test suite truly exercises the code fully. The failure case here is over-pruning because your tests skipped some code that will really be needed in production.

Be sure to get the latest version of coverage.py (v3.4): it adds a new feature to indicate files that are never executed at all.

BTW:: for a first cut prune, Python provides a neat trick: remove all the .pyc files in your source tree, then run your tests. Files that still have no .pyc file were clearly not executed!

Ned Batchelder
Thanks for those tips Ned.
landstatic