views:

89

answers:

6

This is a shot from Google BigTable paper

alt text

What can be the kind of scenarios in which instead of having something like Oracle's redo logs, I will need to store multiple versions of the same data within the database?

Coming specifically to this example, why do I need to store multiple versions of a html page in my database? It cannot act as a backup because anyways all the versions are not there, only some of them are (say last 5).

+1  A: 

Coming specifically to this example, why do I need to store multiple versions of a html page in my database?

In case you want to revert to a previous version.

OMG Ponies
@OMG Ponies: We can also use logs for that. Also, it is not possible to store all the versions in the DB. Maybe it is like you have a few versions that in the DB itself and the rest need to be generated from the logs.
Amoeba
@Primx: Restoring from a transaction log isn't something your DBA wants to do on a constant basis. Once if ever, but more than once in a month or so will not leave others with a high opinion of your database skills.
OMG Ponies
@Primx: Logs are useless to users. They can't restore the database form log files, however many applications require the functionality to let the user restore to a previous version of something (think wiki).
Michael Shimmins
+1  A: 

Also useful for tracking changes for auditing/change logs (even if you can't revert, you can at least see who changed what when).

Michael Shimmins
+1  A: 

You are aware that (in addition to redo logs) Oracle also stores previous versions of the same data (in the undo tablespace)? This is called multi-version concurrency control and allows lock-free selects (you can select the previous value of a row that is being changed by an ongoing transaction, without having to wait for the new data to commit).

Thilo
+3  A: 

One thing to recognize with BigTable and similar non-relational stores, is that they have a totally different consistency model.

Once you introduce the concept of distributing your data across multiple nodes, you run the risk of consistency errors. Distributed databases are expected to be able to recover from inconsistencies results from nodes being down without shutting down the database or doing what would be considered a 'restore'.

Say you have a record stored in nodes 'A' and 'B'. In 'multi-master' replication, you don't have the concept of primary and copy. Rather it is possible for the record to be updated in both nodes simultaneously (especially if communication between the two is broken). Versioning can help resolve the resulting consistency issues.

Also, these databases tend not to do 'deletes'. You simply store a newer version that is marked as deleted (or expired, or whatever). Similarly a 'rollback' would be creating a newer version of a record from an earlier one.

Gary
+1  A: 

What situations : Retrieving a view of data 'as was' - this can be very useful for diagnosis (i.e. being able to re-run a process using the same data, without restoring the whole database). See Oracle's Flashback Query for a way to do this over short timescales.

We have a situation where business rules are soft-coded on site by the customer, and stored in the database. They may change at any moment, yet are used to calculate stored data. Versioning the configuration gives us a way to 'rollback' the configuration and understand how data was derived.

(I can't recall the specific term for Oracle's built-in row versioning, where it effectively stores a history table for each table).

Yes, versioning means a lot more storage, but I would say that in the places where it is useful, data is rarely volatile.

JulesLt
+2  A: 

Cases:

You want to know how daa has changed in the past. Examples: Tracking order status as it goes through the process. Track customer addresses even when they moved.

This can be a business requirement, or it can be - actually - a legal requirement. Quite often both.

TomTom
thanks, nice examples!
Amoeba