views:

71

answers:

4

I've read all the SO questions, the Coding Horror articles, and Googled my brains off searching for the best ways to revision control data. They all work and they all have their appropriate implementations based on use cases and so on. What I really want to know is why hasn't a database been written to natively support revisioning on the data-level?

What I am baffled with is that the API is already practically in place with transactions. We start a transaction, change some data, and commit. We are authenticating against the database too so blame is present. My company stores end of month versions of our entire database for accounting purposes which equate to tags. Does this not scream RCS?

Branching is something that databases could benefit from greatly too in regards to schema more than data. Since I really only care about data and this would increase the difficulty of implementation by a massive degree I'll stick to just tags and commits.

Now I know that databases are incredibly time-critical applications so any unnecessary overhead is shunned into oblivion and some databases are epic-level huge and revisions will only exponentiate that size. A per-table, opt-in revision control undoubtedly has a place in small to medium scale environments where there are milliseconds to spare and data history has a degree of importance. I want commits, I want logs, I want reverts, I want diffs, I want blame, I want tags, and I want checkouts. I want MF-ing revision control.

I have a question in there somewhere...

A: 

You forgot I want performance. A DBMS is a pretty low level data storage mechanism, and in systems with billions of rows, performance can be important. Therefore, if you want this sort of auditing system, you can build it yourself using the tools available to you (eg. triggers).

Just as in a filesystem, not all files are appropriate for version control, in a database not all rows would be appropriate for version control either.

Greg Hewgill
I have done the trigger route which is okay and fully in-application logic and they both definitely work but I just don't know why none of these exist. I would like to see a revision comprise of changes to more than one row and even more than one table, however. And as I said this would have to be an opt-in and per-table addition as performance is always important. I guess since this couldn't be done on a large deployment effectively is why it has never been implemented in mainstream DBs. Boy would a few of my smaller, more specialized web sites and applications benefit, though.
Jake Wharton
+1  A: 

You could read about temporal databases.

In "Temporal Data & the Relational Model" by Date, Darwen, and Lorentzos, the authors introduce a sixth normal form to account for issues in tracking temporal data.

Richard Snodgrass proposed TSQL2 as an extension to SQL to handle temporal data.

Implementations include:

Bill Karwin
I'm surprised I have never heard of this but it looks very promising as to what I have envisioned in my head from a quick read of the Wikipedia article. I'm not looking for anything production ready, just something to hack around with.
Jake Wharton
+2  A: 

One native solution is Oracle's Flashback Database (aka Total Recall). It is a chargeable extra to the Enterprise Edition, but it is pretty cool. It transparently stores versions of the data for as long as we want to retain it, and supplies syntax to query old versions of the data. It can be enabled on a table-by-table basis.

Essentially Flashback DB is like using triggers to store records in tracking tables, but slick, performant and invisible to normal working.

APC
A: 

Several DBMSs implement engine-level versioning mechanisms. Unfortunately there is no vendor-independent standard for this so they are all proprietary. Oracle flashback has already been mentioned. Microsoft's Change Data Capture feature in SQL Server is another one.

dportas