views:

2361

answers:

9

What are some strategies that people have had success with for maintaining a change history for data in a fairly complex database. One of the applications that I frequently use and develop for could really benefit from a more comprehensive way of tracking how records have changed over time. For instance, right now records can have a number of timestamp and modified user fields, but we currently don't have a scheme for logging multiple change, for instance if an operation is rolled back. In a perfect world, it would be possible to reconstruct the record as it was after each save, etc.

Some info on the DB:

  • Needs to have the capacity to grow by thousands of records per week
  • 50-60 Tables
  • Main revisioned tables may have several million records each
  • Reasonable amount of foreign keys and indexes set
  • Using PostgreSQL 8.x
+5  A: 

In the past I have used triggers to construct db update/insert/delete logging.

You could insert a record each time one of the above actions is done on a specific table into a logging table that keeps track of the action, what db user did it, timestamp, table it was preformed on, and previous value.

There is probably a better answer though as this would require you to cache the value before the actual delete or update was preformed I think. But you could use this to do rollbacks.

mk
The problem with a database-level solution is that there's no business context for the action, i.e. you don't know which user did it or what they were doing. Most web apps connect to their database using a single username, so the logged-in web username is not the username seen by the trigger.
Andrew Swan
Andrew, any non database-level solution is no audit trail at all as it will not catch records not added in the GUI. Ours catches specific users because all our tables have a last_updated column and the inserts, updates, etc all send the person_id of the person making the update not the web username.
HLGEM
A: 

I agree with mk. I wrote a very basic program in c a while back that journaled modifications to a string. Maybe this would work in your situation.

Ed
+11  A: 

One strategy you could use is MVCC, Multi-Value Concurrency Control. In this scheme, you never do updates to any of your tables, you just do inserts, maintaining version numbers for each record. This has the advantage of providing an exact snapshot from any point in time, and it also completely sidesteps the update lock problems that plague many databases.

But it makes for a huge database, and selects all require an extra clause to select the current version of a record.

Eric Z Beard
A: 

I've used the same method as @mk in the past. Good for maintaining an audit trail and easy to implement.

Christian Hagelid
+7  A: 

If you are using Hibernate, take a look at JBoss Envers. From the project homepage:

The Envers project aims to enable easy versioning of persistent JPA classes. All that you have to do is annotate your persistent class or some of its properties, that you want to version, with @Versioned. For each versioned entity, a table will be created, which will hold the history of changes made to the entity. You can then retrieve and query historical data without much effort.

This is somewhat similar to Eric's approach, but probably much less effort. Don't know, what language/technology you use to access the database, though.

Martin Klinke
Has anyone used JBoss Envers in a eCom high transaction site?
boyd4715
+2  A: 

The only problem with using Triggers is that it adds to performance overhead of any insert/update/delete. For higher scalability and performance, you would like to keep the database transaction to a minimum. Auditing via triggers increase the time required to do the transaction and depending on the volume may cause performance issues.

another way is to explore if the database provides any way of mining the "Redo" logs as is the case in Oracle. Redo logs is what the database uses to recreate the data in case it fails and has to recover.

GotoError
+2  A: 

Similar to a trigger (or even with) you can have every transaction fire a logging event asynchronously and have another process (or just thread) actually handle the logging. There would be many ways to implement this depending upon your application. I suggest having the application fire the event so that it does not cause unnecessary load on your first transaction (which sometimes leads to locks from cascading audit logs).

In addition, you may be able to improve performance to the primary database by keeping the audit database in a separate location.

Jeff Davis
A: 

Consider using a temporal database.

Andrew Swan
+1  A: 

I use SQL Server, not PostgreSQL, so I'm not sure if this will work for you or not, but Pop Rivett had a great article on creating an audit trail here: Pop rivett's SQL Server FAQ No.5: Pop on the Audit Trail

Build an audit table, then create a trigger for each table you want to audit.

Hint: use Codesmith to build your triggers.

Gern Blandston