views:

267

answers:

1

I've been thinking about how to apply data versioning to a relatively simple database I have, and figured I should do something like is mentioned in Jim T's post where there are global revision #s (e.g. like in Subversion or Mercurial) and each database record has a validity interval.

Example:

Create a person.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |NULL|

Update tel no.

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |NULL|

Delete fred:

|Name|D.O.B  |Telephone|From|To  |
|Fred|1 april|555-29384|1   |1   |
|Fred|1 april|555-43534|2   |2   |

Are there downsides to this approach? It doesn't seem that complicated.

The only thing I can think of, is that it seems like it would have subtle effects on primary keys other than an autoincremented record number which is independent and unrelated to the data. For example, if you had data like this:

 Person: (primary key = PersonID which is an autoincrementing integer)
 |PersonID|Name|Telephone|
 |1       |Fred|555-2938|
 |2       |Lois|555-2939|
 |3       |Jim |555-1000|

 Home: (primary key = HomeID which is an autoincrementing integer)
 |HomeID|Address       |
 |1     |123 Elm St.   |
 |2     |456 Maple Ave.|

 PersonHome: (primary key = person ID and home ID)
 |PersonID|HomeID|
 |1       |1     |
 |2       |1     |
 |3       |2     |

then you can't just add the From and To fields above, since you ruin the uniqueness of the primary keys. Instead I would probably have to do something like this (with appropriate indices added to replace the function of the preceding primary keys):

 Person: (primary key = K which is an autoincrementing integer)
 |K|PersonID|Name     |Telephone|From|To  |
 |1|1       |Fred     |555-2938 |1   |NULL|
 |2|2       |Lois     |555-2939 |1   |1   |
 |3|3       |Jim      |555-1000 |1   |NULL|
 |4|4       |Sunshine |555-2000 |1   |2   |
 |5|2       |Lois     |555-1000 |2   |NULL|
 |6|4       |Daisy May|555-2000 |3   |NULL|
 |7|5       |Connor   |         |5   |NULL|

 Home: (primary key = K which is an autoincrementing integer)
 |K|HomeID|Address       |From|To  |
 |1|1     |123 Elm St.   |1   |NULL|
 |2|2     |456 Maple Ave.|1   |NULL|
 |3|3     |789 Vista Dr. |1   |3   |
 |4|3     |104 Vista Dr. |4   |NULL|   

 PersonHome: (primary key = K which is an autoincrementing integer)
 |K|PersonID|HomeID|From|To  |
 |1|1       |1     |1   |NULL|
 |2|2       |1     |1   |1   |
 |3|3       |2     |1   |NULL|
 |4|4       |3     |1   |NULL|
 |5|2       |2     |2   |NULL|
 |6|5       |2     |5   |NULL|

 Revisions: (comments here for illustration)
 |Revision|Comments                                                       |
 |1       |Initial dataset                                                |
 |2       |Lois divorced Fred and moved in with Jim                       |
 |3       |Sunshine changed her name to Daisy May                         |
 |4       |Daisy May's house was renumbered by the fire dept for 911 rules|
 |5       |Lois and Jim had a baby named Connor                           |
+1  A: 

Pondering the same issue myself!

the from|to construct has the disadvantage of requiring an update and an insert when you want to add a new version of the record. I'm also concerned that this construct has dual meaning, it tells you the version, e.g. x | null means most recent, and it tells you the record status, e.g. x | x means deleted.

Historical wisdom suggests using dates http://www.dbforums.com/database-concepts-design/1641734-data-record-versioning-how-implement.html

Working with dates is not the neatest thing to do when you pull your data into your object model.

More recently the two table approach has been favoured http://stackoverflow.com/questions/762405/database-data-versioning

I'm not too keen on this either due to the need to maintain duplicate tables.

If you modified your solution to have a version column and a status column would that work? Highest valued version number is most recent, status of record is in the most recent version.

Still pondering ...

Paul
"If you modified your solution to have a version column and a status column would that work?" Perhaps, but then I'd need to find the row with the maximum revision ID that does not exceed the revision of interest. I would much rather do a query with BETWEEN(from,to); it's quicker and avoids subqueries.
Jason S