views:

323

answers:

4

I know similar questions have been asked before but they don't really have the information I'm looking for - I'm not asking about the mechanics of how to generate unit tests, but whether it's a good idea.

I've written a module in Python which contains objects representing physical constants and units of measurement. A lot of the units are formed by adding on prefixes to base units - e.g. from m I get cm, dm, mm, hm, um, nm, pm, etc. And the same for s, g, C, etc. Of course I've written a function to do this since the end result is over 1000 individual units and it would be a major pain to write them all out by hand ;-) It works something like this (not the actual code):

def add_unit(name, value):
    globals()[name] = value
    for pfx, multiplier in prefixes:
        globals()[pfx + name] = multiplier * value

add_unit('m', <definition of a meter>)
add_unit('g', <definition of a gram>)
add_unit('s', <definition of a second>)
# etc.

The problem comes in when I want to write unit tests for these units (no pun intended), to make sure they all have the right values. If I write code that automatically generates a test case for every unit individually, any problems that are in the unit generation function are likely to also show up in the test generation function. But given the alternative (writing out all 1000+ tests by hand), should I just go ahead and write a test generation function anyway, check it really carefully and hope it works properly? Or should I only test, say, one series of units (m, cm, dm, km, nm, um, and all other multiples of the meter), just enough to make sure the unit generation function seems to be working? Or something else?

+1  A: 

If you auto-generate the tests:

  • You might find it faster to then read all the tests (to inspect them for correctness) that it would have been to write them all by hand.

  • They might also be more maintainable (easier to edit, if you want to edit them later).

ChrisW
IMHO, the logic for generating the tests is barely less complicated (possibly *more* complicated) than the logic being tested -- if *it* is sufficiently simple to be "inspected" for correctness, then why not the original code being tested?
j_random_hacker
I'm not suggesting you inspect the logic for generating the tests: I'm suggesting you inspect the tests themselves. Isn't "by inspection" always the way in which you verify the implementation of a test? As for "why not inspect the original code?", well you'll do that too, obviously, when you write it ... but a reason for having automated tests is that you can rerun them (as regression tests) cheaply: expensive to write, expensive to inspect, but cheap to rerun; if you rely on inspection alone to test the original, then you need to (expensively) reinspect each time the original changes.
ChrisW
Sorry ChrisW, I misread your first point. If all the "generation" does is add some boilerplate to a manually curated list of tests (e.g. convert each line of a manually prepared tab-delimited text file listing parameters and expected return value into a Perl statement that calls a function and tests the result) that's fine, but I still think that if your test generation actually produces *multiple* tests from each unit of human-provided input, it weakens the tests.
j_random_hacker
OTOH I can definitely see the usefulness of having regression tests. Even if the tests themselves are susceptible to the same bugs that might inhabit the code being tested, there's value in ensuring that the tested behaviour doesn't change over time. +1.
j_random_hacker
Well, often you're right, in the sense that "each unit of human-provided input" provides some value, but automatically multiplying that input might then provide no extra value. However, maybe there are some times when it is useful to auto-generate test cases. Consider a puzzle-solving program: you might auto-generate (many) puzzles and for each puzzle test that the solution is valid. Sometimes it's better to manually generate the puzzles (i.e. test cases) for known (known to the human tester) edge cases; occasionally (depending on the problem domain) you might want 1000s of tests for coverage.
ChrisW
Good point -- in many cases (e.g. your puzzle) it is much easier to verify that a given solution is correct than to generate a correct solution. (We're getting close to some core issues in computer science here... :))
j_random_hacker
+1  A: 

You're right to identify the weakness of automatically generating test cases. The usefulness of a test comes from taking two different paths (your code, and your own mental reasoning) to come up with what should be the same answer -- if you use the same path both times, nothing is being tested.

In summary: Never write automatically generated tests, unless the algorithm for generating the test results is dramatically simpler than the algorithm that you are testing. (Testing of a sorting algorithm is an example of when automatically generated tests would be a good idea, since it's easy to verify that a list of numbers is in sorted order. Another good example would be a puzzle-solving program as suggested by ChrisW in a comment. In both cases, auto-generation makes sense because it is much easier to verify that a given solution is correct than to generate a correct solution.)

My suggestion for your case: Manually test a small, representative subset of the possibilities.

[Clarification: Certain types of automated tests are appropriate and highly useful, e.g. fuzzing. I mean that that it is unhelpful to auto-generate unit tests for generated code.]

j_random_hacker
I don't entirely agree. For example, imagine that the code to be tested is a parser and compiler: the tests to be auto-generated might then be various code fragments to be compiled. Parsing is complicated, but auto-generating various (many) code fragments is relatively easy, and code inspection will tell you whether the auto-generated code fragments are valid.
ChrisW
@ChrisW: I was hoping to capture that sort of automated testing under "fuzzing," but maybe that term is too specific. But the general idea of auto-generating many known-correct fragments (e.g. by randomly expanding rules in a grammar) and testing that the parser handles them without complaining is a good one.
j_random_hacker
+1  A: 

I would say the best approach is to unit test the generation, and as part of the unit test, you might take a sample generated result (only enough to where the test tests something that you would consider significantly different over the other scenarios) and put that under a unit test to make sure that the generation is working correctly. Beyond that, there is little unit test value in defining every scenario in an automated way. There may be functional test value in putting together some functional test which exercise the generated code to perform whatever purpose you have in mind, in order to give wider coverage to the various potential units.

Yishai
+1  A: 

Write only just enough tests to make sure that your code generation works right (just enough to drive the design of the imperative code). Declarative code rarely breaks. You should only test things that can break. Mistakes in declarative code (such as your case and for example user interface layouts) are better found with exploratory testing, so writing extensive automated tests for them is waste of time.

Esko Luontola