views:

187

answers:

2

I'm a J2EE developer & we are using hibernate mapping with a PostgreSQL database.

We have to keep track of any changes occurs in the database, in others words all previous & current values of any field should be saved. Each field can be any type (bytea, int, char...)

With a simple table it is easy but we a graph of objects things are more difficult.

So we have, speaking in a UML point of view, a graph of objects to store in the database with every changes & the user.

Any idea or pattern how to do that?

+1  A: 

A common way to do this is by storing versions of objects.

If add a "version" and a "deleted" field to each table that you want to store an audit trail on, then instead of doing normal updates and deletes, follow these rules:

  • Insert - Set the version number to 0 and insert as normal.
  • Update - Increment the version number and do an insert instead.
  • Delete - Increment the version number, set the deleted field to true and do an insert instead.
  • Retrieve - Get the record with the highest version number and return that.

If you follow this pattern, every time you update you will create a new record rather than overwriting the old data, so you will always be able to track back and see all the old objects.

This will work exactly the same for graphs of objects, just add the new fields to each table within the object graph, and handle each insert/update/delete for each table as described above.

If you need to know which user made the modification, you just add a "ModifiedBy" field as well.

(You can either do this processing in your DA layer code, or if you prefer you can use database triggers to catch your update/delete/retrieve calls and re-process them following the rules.)

Obviously, you need to consider space requirements, as every single update will result in a fully new record. If your application is update heavy, you are going to generate a lot of data. It's common to also include a "last modified time" fields so you can process the database off line and delete data older than required.

Simon P Stevens
Bear in mind that this approach virtually destroys relational integrity. In a regular table we would have a primary key PK_COL. Now we have a UID of (PK_COL, VERSION_NO). This makes it harder to enforce the rules that PK_COL uniquely identifies something in the real world. It also complicates foreign key relationships: if the parent record changes (insert a new version) do we insert a new version of all its children *even though none of them have changed*? Finally this approach punishes the most common access path by requiring an additional filter to get the current view.
APC
My previous comment concerns this as a general approach to auditing. It may be that a database storing "graphs of objects" has little need for relational integrity, in which case this technique may satisfy Mada's requirement.
APC
+1  A: 

Current RDBMS implementations are not very good at handling temporal data. That's one reason why maintaining separate journalling tables through triggers is the usual approach. (The other is that audit trails frequently have different use cases to regular data, and having them in separate tables makes it easier to manage access to them). Oracle does a pretty slick job of hiding the plumbing in its Total Recall product, but being Oracle it charges $$$ for this.

Scott Bailey has published a presentation on temporal data in PostgreSQL. Alas it won't help you right now but it seems like some features planned for 8.5 and 8.6 will enable the transparent storage of time-related data. Find out more.

APC