views:

456

answers:

7

We have several different optimization algorithms that produce a different result for each run. For example the goal of the optimization could be to find the minimum of a function, where 0 is the global minima. The optimization runs returns data like this:

[0.1, 0.1321, 0.0921, 0.012, 0.4]

Which is quite close to the global minima, so this is ok. Our first approach was to just choose a threshold, and let the unit test fail if a result occured that was too high. Unfortunately, this does not work at all: The results seem to have a gauss distribution, so, although unlikely, from time to time the test failed even when the algorithm is still fine and we just had bad luck.

So, how can I test this properly? I think quite a bit of statistics are needed here. It is also important that tests are still fast, just letting the test run a few 100 times and then take the average will be too slow.

Here are some further clarifications:

  • For example I have an algorithm that fits a Circle into a set of points. It is extremly fast but does not always produce the same result. I want to write a Unit test to guarantee that in most cases it is good enough.

  • Unfortunately I cannot choose a fixed seed for the random number generator, because I do not want to test if the algorithm produces the exact same result as before, but I want to test something like "With 90% certainty I get a result with 0.1 or better".

+5  A: 

Let the tests run, and if any of them fail, rerun just those tests 50 times and see what proportion of the time they fail. (In an automated way, of course.)

Jon Skeet
+1 nice approach. wouldn't you want to restrict it just to those tests where it's expected? perhaps even wrap them up in their own suite/runner so that a failure rate below a certain percentage triggers a real failure bubbling up to the test runner
MrWiggles
A random result from a unit test is a bad thing. It prevents you from using automated testing during the build process to identify a "broken" build.
Matthew Brubaker
@Matthew: Why does it stop you from doing that? You can automate all of this - and yes MrWiggles, it would be worth annotating those tests explicitly. Perhaps give them an "if it fails once, retest 50 times and it must pass at least 40 times out of those 50" style annotation.
Jon Skeet
why leave that to chance? why not unit test in a way that will force working code to pass all the time?
Matthew Brubaker
I'm sorry, but a unit test passing 40 out of 50 times it is run is still failing.
Matthew Brubaker
+1 agree. Sometimes code has to produce non-deterministic results, so a Unit-Test's result cannot be deterministic either. This solution will catch broken code, is automatic, and has negligible chances if false positives.
Gilad Naor
Such as? Any time you use a random number to dictate what to do, you know what ranges of numbers will result in which parts of the code being run. You can then test that piece of code with plugged values. If not, you need to look at refactoring to make smaller methods that can be tested.
Matthew Brubaker
Such as any sort of optimization based on a Random Walk. Each tiny component has an inherent randomness to it. There is a negligible chance of way-off-result. If there's a real problem, it will stay after X runs. Otherwise, it won't.
Gilad Naor
How do you distinguish between a test failing 20% of the time because of statistics vs. a test failing 15% of the time because of statistics and 5% of the time because of a bug?
Rasmus Faber
+1 Good approach.
Kevin
+7  A: 

A unit test should never have an unknown pass/fail state. If your algorithm is returning different values when run with the same inputs multiple times, you are probably doing something screwy in your algorithm.

I would take each of the 5 optimization algorithms and test them to make sure that given a set of inputs x, you get back an optimized value of y every time.

EDIT: To address random components of your system, you can either introduce the ability to pass in the seed for the random number generator to be used, or you can utilize a mocking library (ala RhinoMocks) to force it to use a particular number when the RNG is asked for a random number.

Matthew Brubaker
I would regard tailoring a test to a specific set of "random" numbers as rather more fragile than seeing whether or not it does well most of the time with a *real* PRNG. Yes, reproducibility is generally good - but not at all costs.
Jon Skeet
Should unit tests ever *really* be used to figure out if something works well? Unit testing, like individual tests, is pass-fail. Does the feature work as advertised. Whether it's effective is a whole different ballgame, with different rules.Don't hit a nail with a screwdriver.
+6  A: 

Your algorithms probably have a random component. Bring it under control.

You can either

  1. Allow the caller to choose the seed for the random number generator. Then use a hardcoded seed in the tests.
  2. Have the caller provide a random number generator. Then use a fake random number generator in the tests.

The second option is probably the best, since that will make it easier for you to reason what the correct result of the algorithm is.

When unit-testing algorithms, what you want to verify is that you have correctly implemented the algorithm. Not whether the algorithm does what it is supposed to do. Unit-tests should not treat the code-under-test as a black box.

You may want to have a separate "performance"-test to compare how different algorithms perform (and whether they actually work), but your unit-tests are really for testing your implementation of the algorithm.

For example, when implementing the Foo-Bar-Baz Optimization Algorithm (TM) you might have accidentally written x:=x/2 instead of x:=x/3. This might mean that the algorithm works slower, but still finds the same algorithm. You will need white-box-testing to find such an error.

Edit:

Unfortunately I cannot choose a fixed seed for the random number generator, because I do not want to test if the algorithm produces the exact same result as before, but I want to test something like "With 90% certainty I get a result with 0.1 or better".

I can not see any way to make a test that is both automatic-verifiable and stochastic. Especially not if you want to have any chance of distinguishing real errors from statistic noise.

If you want to test "With 90% certainty I get a result with 0.1 or better", I would suggest something like:

double expectedResult = ...;
double resultMargin = 0.1;
int successes = 0;
for(int i=0;i<100;i++){
  int randomSeed = i;
  double result = optimizer.Optimize(randomSeed);
  if(Math.Abs(result, expectedResult)<resultMargin)
    successes++; 
}
Assert.GreaterThan(90, successes);

(Note that this test is deterministic).

Rasmus Faber
+10  A: 

It sounds like your optimizer needs two kinds of testing:

  1. testing the overall effectiveness of the algorithm
  2. testing the integrity of your implementation of the algorithm

Since the algorithm involves randomization, (1) is difficult to unit-test. Any test of a random process will fail some proportion of the time. You need to know some statistics to understand just how often it should fail. There are ways to trade off between how strict your test is and how often it fails.

But there are ways to write unit tests for (2). For example, you could reset the seed to a particular value before running your unit tests. Then the output is deterministic. That would not allow you to assess the average effectiveness of the algorithm, but that's for (1). Such a test would serve as a trip wire: if someone introduced a bug into the code during maintenance, a deterministic unit test might catch the bug.

There may be other things that could be unit tested. For example, maybe your algorithm is guaranteed to return values in a certain range no matter what happens with the randomized part. Maybe some value should always be positive, etc.

Update: I wrote a chapter about this problem in the book Beautiful Testing. See Chapter 10: Testing a Random Number Generator.

John D. Cook
+1  A: 

I would suggest that, rather than having your test run against the code producing the gaussian distribution, you create a Monte Carlo-type algorithm that runs the Method many times and then test the overall distribution of results using the appropriate distribution model. If it is an average, for example, then you will be able to test against a firm threshold. If it is more complex, you'll need to create code that models the appropriate distribution (e.g. do values < x make up y% of my results).

Keep in mind that you aren't testing the number generator, you are testing the Unit that generates the values!

Mark Brittingham
A: 

Both jUnit and NUnit can assert floating point datatypes with a tolerance/delta value. I.e. you test if the output is the correct value give or take some decimal. In your case the correct value you want to check is 0, with tolerance 0.5 if you want all the values in the given output to pass (or 0.20 with tolerance +/-0.20).

Because of the random nature of your results, you may want to unit test parts of the algorithm to make sure it really does what it is supposed to.

Spoike
+1  A: 

Thanks for all the answers, I am now doing this:

  1. Run the test 5 times and take the median result.
  2. If the median result is below a certain threshold, the test succeeds.
  3. If the threshold fails, test again until either the threshold is met (test succeeds) OR until I have done so many iterations (around 100 or so) that I can be pretty sure that the median does not get below the threshold any more.

This way whenever a test looks like it is going to fail, it is recalculated so often until it is pretty sure that it really has failed.

This seems to work, but I am not quite satisfied because I am only testing the median result.

martinus
I still say that you need to make your test deterministic. Consider this: in your described setup, if you make a mistake that means that 20% of the time you get a completely wrong answer, your test will still almost always succeed. You are not testing very much then.
Rasmus Faber