Spreading test data across multiple small data sets seems to me to create a maintenance headache whenever the schema is tweaked. Anybody see a problem with create a single larger test data set? By "larger" I'm still only talk about a couple hundred records in total.
I would not use a unique large dataset (you want to avoid any overhead if you don't need it) and follow DbUnit's Best Practices recommendations:
Use multiple small datasets
Most of your tests do not require the entire database to be re-initialized. So, instead of putting your entire database data in one large dataset, try to break it into many smaller chunks.
These chunks could roughly corresponding to logical units, or components. This reduces the overhead caused by initializing your database for each test. This also facilitates team development since many developers working on different components can modify datasets independently.
For integrated testing, you can still use the CompositeDataSet class to logically combine multiple datasets into a large one at run time.
Some more feedback from the Unitils folks:
Automatic test database maintenance
When writing database tests, keep in mind following guidelines:
- Use small sets of test data, containing as few data as possible. In your data files, only specify columns that are used in join columns or the where clause of the tested query.
- Make data sets test class specific. Don't reuse data sets between different test classes, for example do not use 1 big domain data set for all your test classes. Doing so will make it very difficult to make changes to your test data for a test without braking anything for another test. You are writing a unit test and such a test should be independent of other tests.
- Don't use too many data sets. The more data sets you use, the more maintenance is needed. Try to reuse the testclass data set for all tests in that testclass. Only use method data sets if it makes your tests more understandable and clear.
- Limit the use of expected result data sets. If you do use them, only include the tables and columns that are important for the test and leave out the rest.
- Use a database schema per developer. This allows developers to insert test data and run tests without interfering with each other.
- Disable all foreign key and not null constraints on the test databases. This way, the data files need to contain no more data than absolutely necessary
Using small datasets with just enough data has worked decently for us in the past. Sure, there is some maintenance if you tweak the database but this is manageable with some organization.