How to test methods that may not always give correct answer

tags:

unit-testing

views:

136

answers:

+3 Q:

How to test methods that may not always give correct answer

Imagine that you have an internally controlled list of vendors. Now imagine that you want to match unstructured strings against that list. Most will be easy to match, but some may be reasonably impossible. The algorithm will assign a confidence to each match, but a human needs to confirm all matches produced.

How could this algorithm be unit tested? The only idea I have had so far is to take a sample of pairs matched by humans and make sure the algorithm is able to successfully match those, omitting strings that I couldn't reasonably expect our algorithm to handle. Is there a better way?

+1 A:

You can also test if the confidence of strings your algorithm won't handle well is sufficiently low. In this way you can see if there is a threshold over which you can trust your algorithm as well.

Janusz 2009-07-07 19:18:36

That's an excellent idea. Confidence numbers should be covered in the tests, (I hadn't thought of that) and this sounds like a good way to do it.

Stuart Branham 2009-07-07 19:47:20

I don't think there's a better way than what you describe; effectively, you're just using a set of predefined data to test that the algorithm does what you expect. For any very complicated algorithm which has very nonlinear inputs and outputs, that's about the best you can do; choose a good test set, and assure that you run properly against that set of known values. If other values come up which need to be tested in the future, you can add them to the set of tested values.

McWafflestix 2009-07-07 19:19:05

That sound fair. If it's possible (given time constraints) get as large of a sample of human matches as possible, you could get a picture of how well your algorithm is doing. You could design specific unit tests which pass if they're within X% of correctness.

Best of luck.

Jon 2009-07-07 19:19:48

+1 A:

An interesting exercise would be to store the human answers that correct your algorithm and try to see if you could improve your algorithm to not get them wrong.

If you can, add the new matches to the unit tests.

Hans Malherbe 2009-07-07 19:23:38

That would be a fun exercise, actually. Unfortunately I don't think time constraints will allow, but it's something to "put on the list" nonetheless.

Stuart Branham 2009-07-07 19:49:05

I think there are two issues here: The way your code behaves according to the algorithm, and the way the algorithm is successful (i.e does not accept answers which a human later rejects, and does not reject answers a human would accept).

Issue 1 is regular testing. Issue 2 I would go with previous result sets (i.e. compare the algorithm's results to human ones).

Avi 2009-07-07 19:27:26

What you describe is the best way because it is subjective what is the best match, only a human can come up with the appropriate test cases.

rado 2009-07-07 19:29:40

+3 A:

i'd try some 'canonical' pairs, both "should match" and "shouldn't match" pairs, and test only if the confidence is above (or below) a given threshold.

maybe you can also do some ordering checks, such as "no pair should have greater confidence than the one from the exact match pair", or "the pair that matches all consonants should be >= the only vowels one".

Javier 2009-07-07 19:35:11

Testing failure would be a good idea as well, at least to make sure the algorithm's confidence isn't arbitrary. As you pointed out, discarding results within a range, (25-75 percentile, maybe) would be a good idea as well.

Stuart Branham 2009-07-07 19:55:37

It sounds as though you are describing an algorithm which is deterministic, but one which is sufficiently difficult that your best initial guess at the correct result is going to be whatever your current implementation delivers to you (aka deterministic implementation to satisfy fuzzy requirements).

For those sorts of circumstances, I will use a "Guru Checks Changes" pattern. Generate a collection of inputs, record the outputs, and in subsequent runs of the unit tests, verify that the outcome is consistent with the previous results. Not so great for ensuring that the target algorithm is implemented correctly, but it is effective for ensuring that the most recent refactoring hasn't changed the behavior in the test space.

A variation of this - which may be more palatable for your circumstance, is to start from the same initial data collection, but rather than trying to preserve precisely the same result every time you instead predefine some buckets, and flag any time an implementation change moves a test result from one confidence bucket to another.

Samples that have clearly correct answers (exact matches, null matches, high value corner cases) should be kept in a separate test.

VoiceOfUnreason 2009-07-07 19:56:47

What if I enhance the algorithm somehow to produce better results? What if I do that *and* refactor? It's not a bad idea, but in my case I'm not sure I would be able to trust that test.

Stuart Branham 2009-07-07 20:20:26

ansaurus

tags:

views:

answers:

How to test methods that may not always give correct answer

related questions