ansaurus

Question

Versioning in SQL Tables - how to handle it?

Answer 1

+6 A:

I think you've started down the wrong path.

Typically, for versioning or storing historical data you do one of two (or both) things.

You have a separate table that mimics the original table + a date/time column for the date it was changed. Whenever a record is updated, you insert the existing contents into the history table just prior to the update.
You have a separate warehouse database. In this case you can either version it just like in #1 above OR you simply snapshot it once every so often (hourly, daily, weekly..)

Keeping your version number in the same table as your normal one has several problems. First, the table size is going to grow like crazy. This will put constant pressure on normal production queries.

Second, it's going to radically increase your query complexity for joins etc in order to make sure the latest version of each record is being used.

Chris Lively 2010-09-22 19:39:09

Answer 2

+4 A:

Here is my suggested approach, which has worked very well for me in the past:

Forget the version number. Instead, use StartDate and EndDate columns
Write a trigger to ensure that there are no overlapping date ranges for the same ID, and that there is only ever one record with a NULL EndDate for the same ID (this is your currently effective record)
Put indexes on StartDate and EndDate; this should give you reasonable performance

This will easily let you report by date:

select *
from MyTable 
where MyReportDate between StartDate and EndDate

or get the current info:

select *
from MyTable 
where EndDate is null

RedFilter 2010-09-22 19:39:56

Answer 3

+2 A:

Idea 3 will work:

SELECT * FROM EMPLOYEE AS e1
WHERE Position = 'Coder'
AND Version = (
    SELECT MAX(Version) FROM Employee AS e2
    WHERE e1.ID=e2.ID)

You really want to use something like a date though, which is much easier to program and track, and will use the same logic (something like an EffectiveDate column)

EDIT:

Chris is totally correct about moving this info out of your production table for performance, especially if you expect frequent updates. Another option would be to make a VIEW that only shows you the most recent version of each person's info, that you build off of this table.

JNK 2010-09-22 19:41:43

Your solution is technically correct, but I'd be concerned about the performance of that correlated subquery.

Joe Stefanelli 2010-09-22 19:45:44

@Joe - it's not going to be blazing fast on 100m rows to be sure. If he has a covering index on ID+Version it should be reasonably quick.

JNK 2010-09-22 19:47:25

Performance problem noted - it will for sure have frequent updates (or at least, many of the tables in the framework need to be prepared for the frequent updates). See my edit for

glowcoder 2010-09-22 19:52:58

+1 for the idea about using views. I still wouldn't do that, but it's certainly a workable option depending on other constraints.

Chris Lively 2010-09-22 21:36:28

Answer 4

A:

You are defiantly doing this wrong. Keeping a database running sweetly requires that you only have the minimum amount of data in your production tables that you need. Inevitably holding historical data in with the live adds redundancy that will complicate queries and slow performance, plus your successors are going to look really askew at this before submitting it to the DailyWTF!

Instead create a copy of the table - EmployeeHistorical for instance - but with the ID column not set as identity (you might choose to add an additional new ID column and a dateCreated timestamp column too). Then add a trigger to your Employee table that fires on update & delete and writes out a copy of the complete row to the Historical table. And while you're at it capturing the ID of the user doing the edit often comes in handy for audit purposes.

Generally when I'm doing this on an active table I try and create the historical table in a different database as among other things this reduces fragmentation (and hence maintenance) on your prime database and it's easier to handle backups - as archives can grow very large.

Your issues about edit contention should be handled with the normal database transaction and locking mechanisms. Coding adhoc hacks up to emulate such yourself is always time-consuming and error prone (some edge condition you've not thought of always pops up, and to write locks correctly you've really got to grok sempahores, which is decidedly non-trivial)

Cruachan 2010-09-22 20:09:45

ansaurus

tags:

views:

answers:

Versioning in SQL Tables - how to handle it?

related questions