views:

184

answers:

4

I log insert and update info to every table

create_date    TIMESTAMP
create_user_id INT
update_date    TIMESTAMP
update_user_id INT

I thought that instead of putting them on every table, creating only one log table and referencing the log table on every table so I can retrieve the log info only when I need. I know it depend on the application (I am developing a small commercial ERP like application) but do you have any experience this type of table? any performance, maintenance issues? Which do you prefer? Thanks.

log_id         LONG
create_date    TIMESTAMP
create_user_id INT
update_date    TIMESTAMP
update_user_id INT

Edit: I concluded a solution that using only update_date (insert will be treated as an update and no data deletion but just inactivating) and update_user_id columns on every table. I use MySQL and in production environment there will be master and slave servers. I will use replication logs at the slave server to audit data changes if any situation happens to backtrack the data, so design is simpler and auditing is possible although it is not easy.

+4  A: 

The single log table could easily become a bottleneck if written to in addition to any other writes going on.

You'd produce an additional JOIN for some queries too.

In my opinion, I don't see the benefit of the separate table, apart from making the rest of the DB tables a bit 'cleaner'

Fiona Holder
Using a separate table, you could record every transaction, not just the first and last. But I see the the question does not propose to do this, which I find curious.
recursive
recursive: historic data will not be used frequently and comparing weekly backups gives some sort of feature of the recording every transaction though it is harder.
Gok Demir
+1  A: 

About 20 years ago, I learned that the best way to deal with this kind of information is by just doing inserts into a database. You should not modify existing records and definitely not delete any records. Basically, what you would be storing is not the data itself but the modifications on the data. As a result, you would just end up with a single table that could do everything you want, if only you had enough CPU/Disk speed to walk through all data to calculate the data based on just these inserts.

Storing just modifications will allow you to keep a complete history of all your data, allowing you to be extremely flexible. Basically, you'd only need one table which contains everything. The drawback is that you need to do a lot more calculations and need a lot more disk reads so you need tricks to speed the process up. Flexibility at the cost of performance...

In this case, you would end up with a similar problem. A single table would be accessed by anything that would need to add or update records. It would be your bottleneck. It's still great in a single-user environment, since only one user would be using it. In a low-user environment, it could still have a very good performance. But if you have 250+ users accessing this table all the time, it will decrease the overall performance.

Plus, you'll introduce a maintenance problem when someone deletes records. In those cases, the records in this log table also needs to be deleted.

Brings me back on the thing I mentioned in the beginning... What you could do is combine both the history table and your regular tables. Whenever a record is inserted, modified or deleted, you would add a record into this modifications table, including a timestamp and user reference. This would maintain the history over your data and in general, you'd only be doing inserts into this table. This should still be fast. As an added bonus, you can recreate the whole contents of your database simply by replaying all actions inside this modifications table, just in case.

Inserts can be reasonable fast so the loss of performance is reasonable small, although this depends on your implementation.

But for whatever reasons, after learning about using modification tables, I never came in a practical situation where I could just use this technique.

Workshop Alex
+3  A: 

A single log is a wonderful thing.

On each table, have an ID column just for logging purposes. Call it LOG_ID or something.

Anytime you do INSERT, UPDATE or DELETE it works like this.

  1. INSERT the Log entry, get the LOG_ID that was assigned.

  2. Do the INSERT or UPDATE, setting the LOG_ID foreign key on the changed row. For DELETEs you have two choices: actually delete the row or flag the row as "inactive" but don't actually delete it. This second choice makes your log of all changes perfectly complete, but makes your tables quite large and slow because of inactive rows that must be skipped.

  3. Commit.

Be sure that your log design can include the following kinds of information.

  1. Database row changes (Insert, Update, Delete). The Insert and Update changes will have an FK reference to the changed row somewhere. Be sure to include the table name so that an application program can locate the table correctly. Delete changes will only have a table name.

  2. Other processing information like batch job runs. That way you can log batch job start/stop and run times and preserve complete history of processing.

S.Lott
Lott if user updates multiple columns how do you handle it? Do you record every updated column to different rows.
Gok Demir
The original design did not record column changes. Why add that to this design? Record the exact same information: change date and user name.
S.Lott
Yes original design doesn't do it. Actually for single log table my plan is keeping only one record for the each row. If a user updates any row it has a log_id refers to single log table' single row and associated row updated with timestamp and userid. When you mention to include table names type of modification I guessed you store values with them. My proposed design does not include any of this just schema I shown above. Anyway thanks for reply.
Gok Demir
@Gok Demir: do not update the log. Insert. For a given table, you can find all modifications. For a given row, also, you can find the history of modifications.
S.Lott
+1  A: 

we typically use these on most tables:

LastChgID      int
LastChgDate    datetime

sometimes will use, this on a few:

CreateID       int
CreateDate     datetime
LastChgID      int
LastChgDate    datetime

on others there is a complete mirror table where we log every column with a change type flag, a datetime, and the user id.

I'd stay away from a table you update all the time, just add the columns, that is what a database is for, to store information. We HAD a table that was incremented (with an update) each time a stored procedure was called, and it was a deadlock magnet.

We do have a general log table, but it is insert only, and contains debug and error info. It is not not updated every time any row is changed, only when a developer decided to write something there, mainy a few header fields (identity, datetime, user id, procedure called from) with a long formatted string containing the message.

KM
deadlock magnet. yes this is the issue to consider.
Gok Demir