ansaurus

Question

Answer 1

+1 A:

This is still a viable approach. Although, I would classify this as a functional test, or just not a pure unit test. A good unit test would be to take a sampling of those records that gives to a good distribution of the edge cases you may encounter, and write those up. Then, you have your last "acceptance" or "functional" test with your bulk test on all the data.

I have use this approach when testing large amounts of data, and i find it works well enough because the small units are maintainable, and then I know that the bulk test works, and it's all automatic.

casademora 2008-10-24 22:13:56

Answer 2

+1 A:

If what you are trying to achieve is, in fact, a unit test you should mock out the underlying data structures and simulate the data. This technique gives you complete control over the inputs. For example, each test you write may handle a single data point and you'll have a very concise set of tests for each condition. There are several open source mocking frameworks out there, I personally recommend Rhino Mocks (http://ayende.com/projects/rhino-mocks/downloads.aspx) or NMock (http://www.nmock.org).

If it is not possible for you to mock out the data structures I recommend refactoring so you are able to :-) Its worth it! Or you may also want to try TypeMock (http://www.typemock.com/) which allows mocking of concrete classes.

If, however, if you're doing tests against large data sets you're really running functional tests not unit tests. In which case loading data into a database or from disk is a typical operation. Rather than avoid it you should work on getting it running in parallel with the rest of your automated build process so the performance impact isn't holding any of your developers up.

frenchPython 2008-10-24 22:22:52

Answer 3

+1 A:

The best approach I've found so far is to serialize the structures or data sets from disk, perform the operations under test, serialize the results to disk, then diff the files containing the serialized results against files containing expected results.

I've written code which uses the above technique except rather than serialising from disk in the test, I have converted serialised data to a byte array which the compiler can place into the executable for you.

For example, your serialised data can be converted into:

unsigned char mySerialisedData[] = { 0xFF, 0xFF, 0xFF, 0xFF, ... };

test()
{
    MyStruct* s = (MyStruct*) mySerialisedData;

}

For a more verbose example (in C#) see this unit test. It shows an example of using some hardcoded serialised data as input to tests, testing assembly signing.

Dave Hillier 2008-10-25 22:22:11

ansaurus

tags:

views:

answers:

Unit testing large data sets?

related questions