views:

45

answers:

3

I have a function that performs a hierarchical clustering on a list of input vectors. The return value is the root element of an object hierarchy, where each object represents a cluster. I want to test the following things:

  1. Does each cluster contain the correct elements (and maybe other properties as well)?
  2. Does each cluster point to the correct children?
  3. Does each cluster point to the correct parent?

I have two problems here. First, how do I specify the expected output in a readable format. Second, how do I write a test-assertion accepts isomorphic variants of the expected data I provide? Suppose one cluster in the expected hierarchy has two children, A and B. Now suppose that cluster is represented by an object with the properties child1 and child2. I do not care whether child1 corresponds to cluster A or B, just that it corresponds to one of them, and that child2 corresponds to the other. The solution should be somewhat general because I will write several tests with different input data.

Actually my main problem here is to find a way to specify the expected output in a readable and understandable way. Any suggestions?

A: 

This is an off-the-top-off-my-head suggestion. It is a bit roundabout as well. Caveat emptor!

First, write a function to create a string representation of a cluster. You will have to write unit tests to ensure that this function works in all cases. The format could be custom or XML (not exactly human friendly but usually easy to work with hierarchical data). You can invoke this function by passing in a cluster: string_representation(cluster).

Second, write a variant of this to generate the same output without passing in an actual cluster. Something like util.test.generate_string_representation('child1', 'child2').

Third, modify your unit test assertions to compare the output of string_representation(cluster) with generate_string_representation('child1', 'child2') as the case may be.

actual = string_representation(f(*args, **kwargs))
expected = generate_string_representation('child1', 'child2')
self.assertEqual(actual, expected)

Make sure that both string functions use the same mechanism to format their output. You don't want to end up chasing minute differences in strings.

Told you, it is quite hackish. I hope others have better answers.

Manoj Govindan
+2  A: 

If there are isomorphic results, you should probably have a predicate that can test for logical equivalence. This would likely be good for your code unit as well as helping to implement the unit test.

This is the core of Manoj Govindan's answer without the string intermediates and since you aren't interested in string intermediates (presumably) then adding them to the test regime would be an unnecessary source of error.

As to the readability issue, you'd need to show what you consider unreadable for a proper answer to be given. Perhaps the equivalence predicate will obviate this.

msw
+1. Agree completely.
Manoj Govindan
A: 

It feels like there maybe some room for breaking your method into smaller pieces. Ones focused on dealing with parsing input and formatting output, could be separate from the actual clustering logic. This way tests around your clustering methods would be fewer and dealing with easily understood and testable data structures like dicts and lists.

Kozyarchuk