Every time I need to desing a new database I spend quite some time thinking on how I should set up the database schema to keep an audit log of the changes.
Some questions have already been asked here about this, but I don't agree that there is a single best approach for all scenarios:
- Database Design For Revisions
- Best design for a changelog auditing database table
- Ideas on database design for capturing audit trails
I have also stumbled upon this interesting article on Maintaining a Log of Database Changes that tries to list the pro and cons of each approach. It's very well written and has interesting information, but it has made my decisions even harder.
My question is: Is there a reference that I can use, maybe a book or something like a decision tree that I can refer to decide which way should I go based on some input variables, like:
- The maturity of the database schema
- How the logs will be queried
- The probability that it will be need to recreate records
- What's more important: write or read performance
- Nature of the values that are being logged (string, numbers, blobs)
- Storage space available
The approaches that I know are:
1. Add columns for created and modified date and user
Table example:
- id
- value_1
- value_2
- value_3
- created_date
- modifed_date
- created_by
- modified_by
Major cons: We loose the history of the modifications. Can't rollback after commit.
2. Insert only tables
- id
- value_1
- value_2
- value_3
- from
- to
- deleted (boolean)
- user
Major cons: How to keep foreign keys up to date? Huge space needed
3. Create a Separate history table for each table
History table example:
- id
- value_1
- value_2
- value_3
- value_4
- user
- deleted (boolean)
- timestamp
Major cons: Needs to duplicate all audited tables. If the schema changes it will be needed to the migrate all the logs too.
4. Create a Consolidated history Table for All Tables
History table example:
- table_name
- field
- user
- new_value
- deleted (boolean)
- timestamp
Major cons: Will I be able to recreate the records (rollback) if needed easily? The new_value column needs to be a huge string so it can support all different column types.