views:

214

answers:

3

I would like to ask about your suggestions concerning unit testing against large databases.

I want to write unit tests for an application which is mostly implemented in T-SQL so mocking the database is not an option. The database is quite large (approx. 10GB) so restoring the database after a test run is also practically impossible.

The application's purpose is to manage the handling of applications for credit agreements. There are users in specific roles that change the state of agreement objects and my job is to test part of this process.

I'm considering two approaches:

First Approach

Create agreements that meet specific conditions and then test changes of agreement state (eg. transition from waiting in some office to handled in this specific office). The agreements will be created in application itself and they will be my test cases. All the tests will be in transactions that would be rolled back after performing these tests.

Advantages

The advantage of this approach is quite straightforward test. Expected data could be easily described because I exactly know how the object should look like after the transition.

Disadvantages

The disadvantage is that the database cannot change in a way that will break the test. Users and agreements used in test cases must always look the same and if there will be a need to change the database the preparation process will have to be repeated.


Second Approach

Create agreements in unit tests. Programatically create agreements that would meet specific conditions. The data used for creating agreement will be chosen randomly. Also the users that will change agreement state will be created randomly.

Advantages

The advantage of this approach is ease of making changes to objects and ability to run tests on databases with different data.

Disadvantages

Both objects (agreement and user) have lots of fields and related data and I'm afraid it would take some time to implement creation of these objects (I'm also afraid that these objects may contain some errors because the creation method will be quite hard to implement without errors).


What do you think about these two approaches?

Do any Stack Overflow readers think it is worth the effort create objects as described in second approach?

Does anyone here have any experience creating such tests?

+3  A: 

I'm not sure I entirely agree with your assumption that you cannot restore the database after a test run. While I definitely agree that some tests should be run on a full-size, multi-TB database, I don't see why you can't run most of your tests on a much, much smaller test database. Are there constraints that need to be tested like "Cannot be more than a billion identical rows?"

My recommendation would actually be to use a smaller test database for most of your functional specs, and to create-drop all of its tables with each test, with as little sample data as is necessary to test your functionality.

CaptainAwesomePants
creating scripts for getting rid of unecessary data and making sure that everything is ok on smaller db would take too much time. this approach would be basically the same to first approach because if the database changes i will still have to create data for tests in new db.
empi
We do this where I work. It can be pretty nice. For postgres all you really need to do is run pg_dump with the right flags to get the data you need and the schema. For 10gb of data you naturally might only want to take a subset :)
fuzzy-waffle
@empi: "...would take too much time." That time is amortized across the development of the project and is made up many times over with the ability to have regression testing and less defects.
SnOrfus
+1  A: 

How about testing everything in a transaction and then roll it back? E.g:

BeginTransaction
DoThings
VerifyResult
RollbackTransaction
Cellfish
+1  A: 

For creating fixture data for tests, you have a few choices:

(a) Create a script that creates an empty database, and then adds a small number of records as the fixture data. This data can be hand-constructed, or a few records from the real database. This is the Rails approach, and pretty common in the Java world.

(b) It's also common to use a "factory" to create this data (some sort of application code). There is an initial investment in building these classes, but once they are built they can be re-used for all your tests. This is now very popular in Ruby/Rails code. (This is your Second Approach above.)

(c) Of course you can use a copy of the "production" data, and try to test against that. But this is probably the hardest approach as you will always be competing against he real world changing the data. And it also tends to be orders of magnitude slower than a small set of fixture data.

There's definitely a cost of getting from state (c) to state (a) or (b)-- but it is an investment in the future. It won't take that long-- even if it takes a whole day, the speed-up in the test running will make up for it quickly.

There's a someone independent issue. After you get your data into the database, and then run your tests, you need to restore it. There are a few common approaches:

(1) rollback the transaction. This is a great way to go-- if practical. Sometimes, though, you actually need to confirm that transactions completed, so this doesn't work.

(2) just re-load a new set of fixture data. If your fixture data is small this is workable. A little slower than (1).

(3) manually undo what the tests have done. This is the most error prone and difficult approach, but possible.

Recommendation?

It sounds like your application is complicated. I'd recommend hand-crafting a small set of data for your tests (a). Keep it separate from your main database so that it's easier to keep track of and reload. Try to rollback transactions, but if that doesn't work for you, you can reload from a script before each test (remember-- the data is small).

The other piece of the puzzle is database migrations, if you don't already have that nailed. These are scripts that use use to evolve your database. If you have these organized and automated, you can apply them to your test/fixture data as well as your production data.

ndp