views:

1144

answers:

8

(Not related to versioning the database schema)

Applications that interfaces with databases often have domain objects that are composed with data from many tables. Suppose the application were to support versioning, in the sense of CVS, for these domain objects.

For some arbitry domain object, how would you design a database schema to handle this requirement? Any experience to share?

+7  A: 

A technique I've used for this in that past has been to have a concept of "generations" in the database, each change increments the current generation number for the database - if you use subversion, think revisions. Each record has 2 generation numbers associated with it (2 extra columns on the tables) - the generation that the record starts being valid for, and the generation the it stops being valid for. If the data is currently valid, the second number would be NULL or some other generic marker.

So to insert into the database:

  1. increment the generation number
  2. insert the data
  3. tag the lifetime of that data with valid from, and a valid to of NULL

If you're updating some data:

  1. mark all data that's about to be modified as valid to the current generation number
  2. increment the generation number
  3. insert the new data with the current generation number

deleting is just a matter of marking the data as terminating at the current generation.

To get a particular version of the data, find what generation you're after and look for data valid between those generation versions.

Example:

Create a person.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |NULL|

Update tel no.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |NULL|

Delete fred:

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |2   |
Jim T
+1. I was thinking along these lines but was unsure; it's very reassuring to see that someone has used this technique successfully.
Jason S
+1  A: 

You'll need a master record in a master table that contains the information common among all versions.

Then each child table uses master record id + version no as part of the primary key.

It can be done without the master table, but in my experience it will tend to make the SQL statements a lot messier.

Roy Tang
+2  A: 

An alternative to strict versioning is to split the data into 2 tables: current and history.

The current table has all the live data and has the benefits of all the performance that you build in. Any changes first write the current data into the associated "history" table along with a date marker which says when it changed.

Jim T
+1  A: 

See this existing question

Matthew Watson
+8  A: 

Think carefully about the requirements for revisions. Once your code-base has pervasive history tracking built into the operational system it will get very complex. Insurance underwriting systems are particularly bad for this, with schemas often running in excess of 1000 tables. Queries also tend to be quite complex and this can lead to performance issues.

If the historical state is really only required for reporting, consider implementing a 'current state' transactional system with a data warehouse structure hanging off the back for tracking history. Slowly Changing Dimensions are a much simpler structure for tracking historical state than trying to embed an ad-hoc history tracking mechanism directly into your operational system.

Also, Changed Data Capture is simpler for a 'current state' system with changes being done to the records in place - the primary keys of the records don't change so you don't have to match records holding different versions of the same entity together. An effective CDC mechanism will make an incremental warehouse load process fairly lightweight and possible to run quite frequently. If you don't need up-to-the minute tracking of historical state (almost, but not quite, and oxymoron) this can be an effective solution with a much simpler code base than a full histoy tracking mechanism built directly into the application.

ConcernedOfTunbridgeWells
Just read the SCD page. Lots of food for thought. The alternative Type 6 looks like it is hitting close to home for me.
Christian Vest Hansen
Type 2/6 (a type 6 is just a type 2 with a self join to the most recent version) dimensions are probably what you want here. Look into using changed data capture if you want to update the warehouse frequently.
ConcernedOfTunbridgeWells
I'd like to say that although this was two years ago (almost to the day), this answer was very useful! Thank you!
glowcoder
A: 

A simple fool-proof way, is to add a version column to your tables and store the Object's version and choose the appropriate application logic based on that version number. This way you also get backwards compatibility for little cost. Which is always good

Robert Gould
A: 

ZoDB + ZEO implements a revision based database with complete rollback to any point in time support. Go check it.

Bad Part: It's Zope tied.

lms
+1  A: 

If you are using Hibernate JBoss Envers could be an option. You only have to annotate classes with @Audited to keep their history.

deamon