How to do a meaningful code-coverage analysis of my unit-tests?

views:

345

answers:

+1 Q:

How to do a meaningful code-coverage analysis of my unit-tests?

I manage the testing for a very large financial pricing system. Recently our HQ have insisted that we verify that every single part of our project has a meaningful test in place. At the very least they want a system which guarantees that when we change something we can spot unintentional changes to other sub-systems. Preferably they want something which validates the correctness of every component in our system.

That's obviously going to be quite a lot of work! It could take years, but for this kind of project it's worth it.

I need to find out which parts of our code are not covered by any of our unit-tests. If I knew which parts of my system were untested then I could set about developing new tests which would eventually approach towards my goal of complete test-coverage.

So how can I go about running this kind of analysis. What tools are available to me?

I use Python 2.4 on Windows 32bit XP

UPDATE0:

Just to clarify: We have a very comprehensive unit-test suite (plus a seperate and very comprehensive regtest suite which is outside the scope of this exercise). We also have a very stable continuous integration platform (built with Hudson) which is designed to split-up and run standard python unit-tests across our test facility: Approx 20 PCs built to the company spec.

The object of this exercise is to plug any gaps in our python unittest suite (only) suite so that every component has some degree of unittest coverage. Other developers will be taking responsibility for non Python components of the project (which are also outside of scope).

"Component" is intentionally vague: Sometime it will be a class, other time an entire module or assembly of modules. It might even refer to a single financial concept (e.g. a single type of financial option or a financial model used by many types of option). This cake can be cut in many ways.

"Meaningful" tests (to me) are ones which validate that the function does what the developer originally intended. We do not want to simply reproduce the regtests in pure python. Often the developer's intent is not immediatly obvious, hence the need to research and clarify anything which looks vague and then enshrine this knowledge in a unit-test which makes the original intent quite explicit.

+3 A:

For the code coverage alone, you could use coverage.py.

As for coverage.py vs figleaf:

figleaf differs from the gold standard of Python coverage tools ('coverage.py') in several ways. First and foremost, figleaf uses the same criterion for "interesting" lines of code as the sys.settrace function, which obviates some of the complexity in coverage.py (but does mean that your "loc" count goes down). Second, figleaf does not record code executed in the Python standard library, which results in a significant speedup. And third, the format in which the coverage format is saved is very simple and easy to work with.

You might want to use figleaf if you're recording coverage from multiple types of tests and need to aggregate the coverage in interesting ways, and/or control when coverage is recorded. coverage.py is a better choice for command-line execution, and its reporting is a fair bit nicer.

I guess both have their pros and cons.

Razzie 2009-06-17 10:27:46

Thanks for the suggegstion: Why use this rather than figleaf?

Salim Fadhley 2009-06-17 15:56:17

I added a 'coverage.py vs figleaf' explanation in my answer. I don't think it actually matters much which one you choose. To be fair, I haven't used either of them, though these are the two main tools.

Razzie 2009-06-18 07:32:23

coverage3 fixes a lot of previous shortcomings, i.e. not executing stdlib et al.

Almad 2009-06-18 07:59:19

+1 A:

Assuming you already have a relatively comprehensive test suite, there are tools for the python part. The C part is much more problematic, depending on tools availability.

For python unit tests
For C code, it is difficult on many platforms because gprof, the Gnu code profiler cannot handle code built with -fPIC. So you have to build every extension statically in this case, which is not supported by many extensions (see my blog post for numpy, for example). On windows, there may be better code coverage tools for compiled code, but that may require you to recompile the extensions with MS compilers.

As for the "right" code coverage, I think a good balance it to avoid writing complicated unit tests as much as possible. If a unit test is more complicated than the thing it tests, then it is a probably not a good test, or a broken test.

David Cournapeau 2009-06-17 10:31:10

I only care about Python - and yes: We have a very comprehensive set of tests.

Salim Fadhley 2009-06-17 15:50:58

The degree of complexity of the tests depends on what it is we are testing: Some non-financial classes can be quite trivially tested, wheras we have some financial stuff which may need tests which are greatly in excess of the thing being tested: This reason for this is that there are so many interesting edge-cases which must be validated.

Salim Fadhley 2009-06-17 15:57:58

The rationale is that if your tests are complicated, more than the code, you start needing testing your unit tests.

David Cournapeau 2009-06-17 16:46:33

+3 A:

First step would be writing meaningfull tests. If you'll be writing tests only meant to reach full coverage, you'll be counter-productive; it will probably mean you'll focus on unit's implementation details instead of it's expectations.

BTW, I'd use nose as unittest framework (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/); it's plugin system is very good and leaves coverage option to you (--with-coverage for Ned's coverage, --with-figleaf for Titus one; support for coverage3 should be coming), and you can write plugisn for your own build system, too.

Almad 2009-06-17 10:35:01

The tests already exist: It's 100% python standard-library unittest.

Salim Fadhley 2009-06-17 15:51:35

Figleaf seems like a very good suggestion: I like the fact that it comes with a basic reporting tool. This can be adapted to suit my purposes.

Salim Fadhley 2009-06-17 15:58:36

Well, nose is compatible with standard unittests, but you have some free features (like plugins, but better test selection as well).

Almad 2009-06-17 16:08:41

Is there a good guide for migrating unittest.TestCase to Nose?

Salim Fadhley 2009-06-17 16:14:56

There is no need for migration - just run your testsuite with nosetests command.

Almad 2009-06-18 07:58:36

+2 A:

"every single part of our project has a meaningful test in place"

"Part" is undefined. "Meaningful" is undefined. That's okay, however, since it gets better further on.

"validates the correctness of every component in our system"

"Component" is undefined. But correctness is defined, and we can assign a number of alternatives to component. You only mention Python, so I'll assume the entire project is pure Python.

Validates the correctness of every module.
Validates the correctness of every class of every module.
Validates the correctness of every method of every class of every module.

You haven't asked about line of code coverage or logic path coverage, which is a good thing. That way lies madness.

"guarantees that when we change something we can spot unintentional changes to other sub-systems"

This is regression testing. That's a logical consequence of any unit testing discipline.

Here's what you can do.

Enumerate every module. Create a unittest for that module that is just a unittest.main(). This should be quick -- a few days at most.
Write a nice top-level unittest script that uses a testLoader to all unit tests in your tests directory and runs them through the text runner. At this point, you'll have a lot of files -- one per module -- but no actual test cases. Getting the testloader and the top-level script to work will take a few days. It's important to have this overall harness working.
Prioritize your modules. A good rule is "most heavily reused". Another rule is "highest risk from failure". Another rule is "most bugs reported". This takes a few hours.
Start at the top of the list. Write a TestCase per class with no real methods or anything. Just a framework. This takes a few days at most. Be sure the docstring for each TestCase positively identifies the Module and Class under test and the status of the test code. You can use these docstrings to determine test coverage.

At this point you'll have two parallel tracks. You have to actually design and implement the tests. Depending on the class under test, you may have to build test databases, mock objects, all kinds of supporting material.

Testing Rework. Starting with your highest priority untested module, start filling in the TestCases for each class in each module.
New Development. For every code change, a unittest.TestCase must be created for the class being changed.

The test code follows the same rules as any other code. Everything is checked in at the end of the day. It has to run -- even if the tests don't all pass.

Give the test script to the product manager (not the QA manager, the actual product manager who is responsible for shipping product to customers) and make sure they run the script every day and find out why it didn't run or why tests are failing.

The actual running of the master test script is not a QA job -- it's everyone's job. Every manager at every level of the organization has to be part of the daily build script output. All of their jobs have to depend on "all tests passed last night". Otherwise, the product manager will simply pull resources away from testing and you'll have nothing.

S.Lott 2009-06-17 11:25:47

Thanks for your comments: There is zero danger of this project being de-funded. Testing is mandated from the highest authorities and everybody on the team buys into it. In theory every non-trivial class SHOULD have a unit-test suite dedicated to some form of validation. How "meaningful" these tests are varies: For example the simplest tests might just verify that a function can be called and return appropriate types. Better tests might validate the mathematical properties of the returned data. Obviously we want more of the latter.

Salim Fadhley 2009-06-17 15:54:46

Also, we have a very comprehensive unit-test suite (plus a seperate regtest suite which is outside the scope of this exercise). The object of this exercise is to plug any gaps in our unittest (only) suite so that every component has some degree of unittest coverage. "Component" is intentionally vague: Sometime it will be a class, other time an entire module or assembly of modules."Meaningful" tests (to me) are ones which validate that the function does what the developer originally intended. This will require much research.

Salim Fadhley 2009-06-17 16:02:10

@Salim Fadhley: Please update the question with these additional facts. It's not at all clear from the question that you have any tests in place.

S.Lott 2009-06-17 16:47:46

+2 A:

FWIW, this is what we do. Since I don't know about your Unit-Test and Regression-Test setup, you have to decide yourself whether this is helpful.

Every Python package has UnitTests.
We automatically detect unit tests using nose. Nose automagically detects standard Python unit tests (basically everything that looks like a test). Thereby we don't miss unit-tests. Nose also has a plug-in concept so that you can produce, e.g. nice output.
We strive for 100% coverage for unit-testing. To this end, we use Coverage to check, because a nose-plugin provides integration.
We have set up Eclipse (our IDE) to automatically run nose whenever a file changes so that the unit-tests always get executed, which shows code-coverage as a by-product.

stephan 2009-06-17 17:51:28

ansaurus

tags:

views:

answers:

How to do a meaningful code-coverage analysis of my unit-tests?

related questions