views:

283

answers:

6

My requirement is for a data model where a full audit trail is retained for changes to every attribute of every object. Object definitions are also fluid: new attributes can appear or go away over time. This audit trail will live separately from the original databases, so a trigger-based auditing model won't work.

In a relational database, I can implement this with a single large ATTRIBUTE_HISTORY table that records every individual change to each attribute, with appropriate timestamp and responsibility fields.

My question: are any of the newer storage models (BigTable, HBase, CouchDB, RDF stores, etc.) superior to a RDBMS for this purpose?

A: 

You can also create a logging system in application code. Log calls to every function that modifies the database and results in a successful COMMIT.

Answer to your question: no, just use a RDBMS. It'll be easier to run queries on the log.

Seun Osewa
Trouble with auditing in the application is that not every change to data happens in the applcation. In my opinion it is a very bad practice to put auditing inthe applciation only.
HLGEM
A: 

I see no reason why a trigger can't reference a different database. However, all changes would fail if that database was unavailable which can be a problem if the audit database is on another server and the conectin is down. But our auditing is through triggers and we have a separate audit database.

HLGEM
A: 

I don't think a particular database paradigm can be considered superior to any other for an audit log. It isn't so much a data model problem as it is a logging problem and can be considered somewhat orthogonal to the data store.

That being said CouchDB can be configured to never delete old versions of documents. With the addition of a timestamp and possibly a user field to each document, you could use the feature to automatically keep an entire history of all the objects ever stored in the db. That might be the easiest out of the box setup for audit logging you could get in a database.

As for the others I don't know what type of support they might have regarding this.

Caveats:

(You would also have to follow a never delete strategy for objects in the db and just mark objects deleted instead)

(For an RDBMS the easiest solution might be a simple table that logs every insert, update, or delete statement run on the database in a text field with timestamp and user fields. I did that once on a postgres database and it worked pretty well for keeping history)

Jeremy Wall
A: 

Create a table that holds the names of the tables you want to audit (ex: AuditTable); minimum columns should be: TableName (varchar), RandomValue (float). Add a trigger on the AuditTable that will fire whenever a the RandomValue changes - this trigger's job will be to drop dynamically re-create the audit trigger for each table listed (TableName) in the AuditTable. The audit trigger (for each table) should insert into an AuditRecord table, which captures: the table name, primary key ID, action type (INSERT/UPDATE/DELETE), original field values and updated field values. If table structure changes, a simple update of the RandomValue in AuditTable will cause the triggers to re-generate. You will need to write the code that auto-generates the trigger for a given table; it is recommended that only 1 primary integer key is used on each table that is audited.

ftank99
A: 

Performance would be a concern for such audit trails. I would go for a cache (which is quite fault tolerant) and persist the cache content when the count reaches certain threshold (say 1000 records). This would ideally be a batch update.

I feel in-memory databases with persistence options (like H2 ) should also do the same. But I haven't used it myself.

Sathya
+3  A: 

The question of how to store the data depends on how it is going to be used amongst other issues. I'd suggest going with something simple which you understand for now, testing if you have an idea of the likely load you expect. Then in future making improvements as necessary.

In relation to your issue with a trigger based auditing system, since it sounds like you're set on having the work done at the database level I've one suggestion. Use triggers to log changes to a table within the database, then overnight (or however frequently) process the contents of the table and create the audit trail wherever it is being stored and empty the contents of the table in the database. This way you can capture changes at the database level but still fulfil your requirement to store the actual audit trail elsewhere.

Robin