tags:

views:

667

answers:

6

I'm trying to create some functionality that keeps an audit trail of how data in a given user form has been changed over time, with a dated audit at the foot of that page. For example:

02/04/09 21:49 Name changed from "Tom" to "Chris".

I'm doing this by storing the data in it's present format in the session and then on save checking whether there are any differences in the data being stored. If there are, I'm storing the data how it was before the latest edit in a table called history, and storing the new values in the current user table.

Is this the best approach to be taking?

A: 

The session involvement makes me a little wary (are you sure you're handling it properly when two users are working on the same data at the same time?), but in general, yeah, keeping a history table is the right thing.

chaos
I think so, though I am a newbie! How do you mean? I've built in a check that stores the last edit date and time on commencement of an edit and I'm validating that before saving. If the date and time has changed the user is asked to check over the latest changes before confirming they want to save.
chriswattsuk
Sounds like you're doing great, then.
chaos
+3  A: 

One suggestion; this would be relatively easy to do in a database trigger. In that case, you would never have to worry about whether the code running the update remembers to add a history record.

Chase Seibert
Ah yes triggers! I have been a bit fearful of triggers but feel I should take the plunge! :)
chriswattsuk
A: 

I would also think about a database trigger on insert or update to record change details (who, when, what, value before, value after) to a separate audit table. That way you know that even if the data is changed outide of your app using the database directly, it will still be picked up.

You might also want to do something to detect if the data is changed outide of your app, such as calculate a hash or crc of the record and store it in a field somewhere, then check it when reading the data.

PaulHurleyuk
A: 

I've always been a fan of using one table instead of breaking it up into an "active" table and a "history" table. I put 4 columns on these tables, all timestamps: created, deleted, start, end. "created" and "deleted" are fairly self-explanatory. The "start" and "end" timestamps are for when the record was actually the "active" record. The currently-active record would have a "start" time prior to now() and a NULL "end" time. By separating out the "created" and "start" times, you can schedule changes to take place in the future.

This design, as opposed to the two-table design, allows you to easily write queries that will automatically operate on the right data. Suppose your table is storing the tax rate over time... you don't want to have all your queries that use tax rates in their calculations have the extra complexity of deciding to look stuff up in a history table when processing old invoices, for example... you can just look up the tax rate in effect at the time the invoice was created in one query, regardless of whether it's the current tax rate or not.

This idea is not originally mine (although I did re-invent the rough idea on my own prior to reading about it)... you can find a detailed discussion of it in this online book.

rmeador
+4  A: 

I'm not sure there is one "best approach", there are so many variables to take into consideration, including how far down the development path you are.

Some comments having been through both code-based and db-trigger auditing solutions; I hope you can see where you are now at (in terms of development) could affect these issues:

  • If you need to map the user who changed the data (which you normally do) then db triggers will need to get this information somehow. Not impossible, but more work and several ways to approach this (db user executing query, common user column in each table, etc.)
  • If you use db triggers and you rely on affected rows count returned from queries, then your audit triggers need to have this turned off, or your existing code logic modified to expect them.
  • IMHO db triggers offer more security, but they are not foolproof, as anyone with appropriate access can disable the triggers, modify data and then enable them again. In other words, ensure your db security access rights are tight.
  • Having a single table for history is not a bad way to go, although you will have more work to do (and data to store) if you are auditing history for multiple tables, especially when it comes to reconstructing the audit trail.
  • Having an audit history table for each table is another option. You just need each column in the audit table to be nullable, as well as storing date and time of action (insert/update/delete) and the user associated with the action.
  • If you go with the single table option, unless you have a lot of time to spend on this, don't get too fancy trying to audit only on updates or deletes, although it may be tempting to avoid inserts (since most apps do this more often than updates or deletes), reconstructing the audit history is a non-trivial task.
  • These audit tables can get huge, so have a strategy if they start affecting performance. Options include table partitioning onto different discs, archiving, etc. basically think about this now and not when it becomes a problem :)
Si
A: 

I think your proposal would involve writing a lot of code/metadata to enable comparison of objects/records so you get a business-level audit.

Alternatively, a database trigger may not give you a high-enough level view of what happened. This may be acceptable if you use the audit so infrequently that the effort of recreating the business meaning is ok.

This also seems like a good application for AOP (Aspects), where you could use reflection on the object model to dump something meaningful without requiring a lot of metadata.

edoloughlin