views:

192

answers:

5

I am writing a fairly complicated machine learning program for my thesis in computer vision. It's working fairly well, but I need to keep trying out new things out and adding new functionality. This is problematic because I sometimes introduce bugs when I am extending the code or trying to simplify an algorithm.

Clearly the correct thing to do is to add unit tests, but it is not clear how to do this. Many components of my program produce a somewhat subjective answer, and I cannot automate sanity checks.

For example, I had some code that approximated a curve with a lower-resolution curve, so that I could do computationally intensive work on the lower-resolution curve. I accidentally introduced a bug into this code, and only found it through a painstaking search when my the results of my entire program got slightly worse.

But, when I tried to write a unit-test for it, it was unclear what I should do. If I make a simple curve that has a clearly correct lower-resolution version, then I'm not really testing out everything that could go wrong. If I make a simple curve and then perturb the points slightly, my code starts producing different answers, even though this particular piece of code really seems to work fine now.

+7  A: 

Without seeing your code, it's hard to tell, but I suspect that you are attempting to write tests at too high a level. You might want to think about breaking your methods down into smaller components that are deterministic and testing these. Then test the methods that use these methods by providing mock implementations that return predictable values from the underlying methods (which are probably located on a different object). Then you can write tests that cover the domain of the various methods, ensuring that you have coverage of the full range of possible outcomes. For the small methods you do so by providing values that represent the domain of inputs. For the methods that depend on these, by providing mock implementations that return the range of outcomes from the dependencies.

tvanfosson
This advice was helpful. In the example case, I am doing the approximation via a dynamic program. This can be decomposed into several components which are deterministic:1. calculating the error of a particular piece of the approximation. I can just do this by hand for some particular curve.2. Making sure the overall objective function is correct. Again, I can do it by hand.3. Making sure the dynamic program is correct. (This is where the bug was.)
forefinger
If I know the overall objective function is correct, I can test this by feeding in simply decomposable curves that have been perturbed. As long as the answer it gives me scores better than the answer I thought I wanted, the dynamic program is probably working correctly.
forefinger
+6  A: 

"then I'm not really testing out everything that could go wrong."

Correct.

The job of unit tests is not to test everything that could go wrong.

The job of unit tests is to test that what you have does the right thing, given specific inputs and specific expected results. The important part here is the specific visible, external requirements are satisfied by specific test cases. Not that every possible thing that could go wrong is somehow prevented.

Nothing can test everything that could go wrong. You can write a proof, but you'll be hard-pressed to write tests for everything.

Choose your test cases wisely.

Further, the job of unit tests is to test that each small part of the overall application does the right thing -- in isolation.

Your "code that approximated a curve with a lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation. The integrated whole could also be tested to be sure that it works.

Your "computationally intensive work on the lower-resolution curve" for example, probably has several small parts that can be tested as separate units. In isolation.

That point of unit testing is to create small, correct units that are later assembled.

S.Lott
This seems like reasonable advice, but it doesn't help with my specific issues as much as some of the other responses.
forefinger
"specific issues"? That's not easy to understand, since your question didn't seem to list any specific issues. Feel free to update your question if you want more information.
S.Lott
+1  A: 

Generally, for statistical measures you would build in an epsilon for your answer. I.E. the mean square difference of your points would be < 0.01 or some such. Another option is to run several times and if it fails "too often" then you have an issue.

Joel
+4  A: 

Your unit tests need to employ some kind of fuzz factor, either by accepting approximations, or using some kind of probabilistic checks.

For example, if you have some function that returns a floating point result, it is almost impossible to write a test that works correctly across all platforms. Your checks would need to perform the approximation.

TEST_ALMOST_EQ(result, 4.0);

Above TEST_ALMOST_EQ might verify that result is between 3.9 and 4.1 (for example).

Alternatively, if your machine learning algorithms are probabilistic, your tests will need to accommodate for it by taking the average of multiple runs and expecting it to be within some range.

x = 0;
for (100 times) {
  x += result_probabilistic_test();
}

avg = x/100;
TEST_RANGE(avg, 10.0, 15.0);

Ofcourse, the tests are non-deterministic, so you will need to tune them such that you can get non-flaky tests with a high probability. (E.g., increase the number of trials, or increase the range of error).

You can also use mocks for this (e.g, a mock random number generator for your probabilistic algorithms), and they usually help for deterministically testing specific code paths, but they are a lot of effort to maintain. Ideally, you would use a combination of fuzzy testing and mocks.

HTH.

0xfe
This is good advice, but it doesn't really solve my problems, because many of the things I need to check are discrete, so that error can't be computed for them, and they cannot be meaningfully averaged.
forefinger
+3  A: 

You may not appreciate the irony, but basically what you have there is legacy code: a chunk of software without any unit tests. Naturally you don't know where to begin. So you may find it helpful to read up on handling legacy code, although the key lessons are the ones which S.Lott and tvanfosson cover in their replies.

APC
This is actually the most helpful advice. All my successful debugging has resulted from using techniques like this by hand. But this PDF gives some good advice for automating the process.Your PDF link didn't work for me, but a simple google located it.
forefinger
@forefinger - I have now fixed the link. But I'm glad you found the article, and found it useful.
APC
The author of that PDF has an excellent book out now on the same subject: http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052
TrueWill