views:

155

answers:

3

Hello all,
     Before we start I know a fair few people consider tests that hit the database not "unit tests". Maybe "integration tests" would be a better name. Either way developer tests that hit the database.

     To enable unit-testing I have a developer local database which I clear and the populate with a know set of data at the start of each test using dbUnit. This all works well enough until a table used by the test changes in some way and I have to manually update all the XML datasets. Which is a pain. I figure other people must have hit the same problem and hopefully found a nice neat solution to it. So for tests that require populating a database what do you use and how do you handle table definitions changing? (While I use Java I am open to solutions utilizing different technologies.)

EDIT: To clarify a little. I have a contrived test like:

void testLoadRevision() {
    database.clear(); // Clears every table dbUnit knows about.
    database.load("load/trevision.xml", "load/tissue.xml");
    SomeDatabaseThingie subject = new SomeDatabaseThingie(databaseProvider);
    Revision actual = subject.load();
    assert(actual, expected);
}

In that I have two tables - tRevision and tIssue. A loaded revision uses a small amount of data from tIssue. Later on tIssue acquires a new field that revisions do not care about. As the new field is "not null" and has no sensible default this test it will fail as the tIssue.xml will be invalid.

With small changes like this it is not too hard to edit the tIssue. But when the number of XML files starts to balloon with each flow it becomes a large amount of work.

Cheers,
    mlk

+1  A: 

I think the answer to this question comes in two phases:

There is only one authoritative definition of the schema

There should be only one definition of what the database looks like. In normal cases, I prefer to have a SQL DDL script that specifies the schema for the database.

The unit tests should use the same authoritative definition of the database schema as the application uses, and it should create the database based on that definition before the test run and remove it completely again after the test run.

That said, tooling may come out of sync with the schema, and you will manually need to update the tool-generated stuff. For example, I use the Entity Framework for .NET that auto-generates classes based on the database schema. When I change the schema, I need to manually tell my tool to update these classes. It's a pain, but I'm not aware of any way out of that, unless the tooling supports automation.

Each test should start with empty data

Each test should start with the database without any data. Every test should populate only the data it needs to execute the test, and when it is done, it should clean out the database again.

What you are currently doing sounds like an anti-pattern called General Fixture, where you try to pre-load a set of data that represents as broad a set of scenarios as possible. However, it makes it very hard to test mutually exclusive conditions, and may also lead to Test Interdependence if you modify this pre-loaded data in some tests.

This is really well explained in the excellent book xUnit Test Patterns.

Mark Seemann
Thanks for your response Mark. With the exceptions of rebuilding the database schema after each test and cleaning the database at the end¹ you are describing what I do. Before **each** test I clean build the data used for just that test. The problem comes when the data used by this test (held in XML documents) is out of sync with the database schema (we have a large number of apps all accessing the same database. I am only responsible for a few of them which end up reading some data produced by apps out side my control.)1 - Having the data at the end of a test useful for debugging.
mlk
+1  A: 

I have the same issue of dbunit xml flat files running out-of-sync upon database schema evolutions which do require data changes (even for things as simple as the addition of mandatory columns).

While transforming all the xml files using some hand-written scripts is an option, I still think the issue should be solved at a different abstraction level more similar to the way one handles live data: evolutionary db design.

Database migration tools already know about delta scripts, so having a sort of dbunit adapter would be great.

If found the following blog entry covering the issue: http://blog.liquibase.org/2007/06/unit-testing-the-database-access-layer.html

To solve the problem of keeping test data definitions from getting out of sync with the schema, you need to have your test data built up along with your database so it will be modified by database refactoring that were made after it was initially created. [..] By including the test data with your database changes, the data is automatically kept up in the same way the production data would be. Using this technique over the dataset per method also has the advantage of performing better because the data is only inserted once,...

but adds himself:

but it has the disadvantage that you need to handle all the test data that any method would want in one place.

...which in turn is not possible for more complex scenarios I guess. He goes on saying:

To facilitate this technique, I built the idea of execution contexts into LiquiBase so you can mark the test data changes and only apply them in environments where you run unit tests. So far I have been happy with the results. The tests fail when there is a differences between the database schema and what the code is expecting or when there is a bug in my SQL and I haven’t lost any tests due to database refactorings.

Here's the link: www.liquibase.org/manual/contexts but it's not what I want at least, although I would be fine with exposing my testdata to a db migration tool, I still like to keep it very close the the database test.

Thoughts anyone?

jonaskilian
Axel, although Flyway looks promising, I'm not sure whether step 3 is feasible: feeding database content back to existing db unit testdata seems complex, if you don't want to bloat them, cause you'd have to take care of- filtering generated values (IDs, triggers etc.)- transforming ReplacementDataSets back by changing replaced values back to the placeholder namebut - most notably - if the relation between a testcase and the testdata in charge is not 1:1, that is the input comes from different xml datasets forming a composition of testdata, you'd have to update different xml files.
jonaskilian
Note to myself: question what you're trying to do: do not use dbunit xml files at all, rather let the tests express via nice and fluent java factory calls what they need instead of relying on scattered xml files containing test data and - even worse - being reused accross different tests making it impossible to know later on why a specific dataset was used at all
jonaskilian
@jonaskilian - I've been playing about with a semi-fluent code based alternative to dbUnit (dbUnit.NET does not sit quite right with me). It is great when the dataset is small, alas the dataset often ends up bloating quite a bit (Dataset has a FK on an agreement, with an FK on client etc) and then it is simply too much to do in code.
mlk
+1  A: 

Well, as I see it, it is a matter of combining what is already there.

The scenario described above:

  1. Write a database migration
  2. Apply the database migration (manually or automatically at the start of the test run)
  3. Watch your tests break due to a constraint violation (not null)

You could extend it so that you a small program that does the following:

  1. Populate the database with the DbUnit XML
  2. Apply the database migration
  3. Extract the contents of your database in-place in your DbUnit XML (and optionally also the DTD) (See DbUnit Home Page -> DbUnit FAQ -> How to extract a flat XML dataset from my database?)
  4. Check your updated DbUnit XML (and DTD) into source control.

For applying the migration, I heartily recommend Flyway. It supports both Sql (with placeholder replacement) and Java-based migrations. You can then apply the migrations using the Maven Plugin or programmatically using the API. The latter fits this case perfectly.

The complete workflow then becomes:

  1. Write your database migration
  2. Execute your DbUnitXmlDtdUpdater program
  3. Watch your unit tests pass

Happy days,

Axel

Disclaimer: I am one of Flyway's developers.

Axel Fontaine
Jonas, surely my proposed answer is not one-size-fits-all. Let me address your points:Generated values: the real question is, do you really want generated values in your test data. I lean towards keeping my test data as stable as possible (for repeatable tests) avoiding generated values wherever I can. If you must use generated values, then DbUnit's DefaultColumnFilter can help. For triggers, a case-by-case analysis is required to see if or how their effects can be compensated. Splitting the dataset in several files means dealing with DbUnit's DefaultTableFilter to divide up the test data.
Axel Fontaine