What's the best way to unit test large data sets? Some legacy code that I'm maintaining has structures of a hundred members or more; other parts of the code that we're working on create or analyze data sets of hundreds of samples.
The best approach I've found so far is to serialize the structures or data sets from disk, perform the operations under test, serialize the results to disk, then diff the files containing the serialized results against files containing expected results. This isn't terribly fast, and it violates the "don't touch the disk" principle of unit testing. However, the only alternative I can think of (writing code to initialize and test hundreds of members and data points) seems unbearably tedious.
Are there any better solutions?