views:

62

answers:

5

Hey, so a bit of a design question here:

I am working on a project that requires me to keep track of when a user performs an Insert, Update, or Delete. The thing I am trying to decide on is whether to have the Logs table reference the Primary Key of the affected row or some other identifier. In order to log a deletion when using Foreign Keys I would need to have a Deleted column for every table. However, if I was using some named identifier without a Foreign Key I will eventually run into name duplication in the Logs table and it will become unclear what the entry refers to. Does anyone have any practical experience with this, specifically the performance impact of using a Deleted column in order to maintain complete Logs?

A related question also is in the design of the Logs table itself. If I go with Foreign Key references in a single Log table, my first instinct would be to create a reference column for each table that is being monitored for changes. This doesn't feel ideal to me as if I have to add a new table down the line I have to then change the Logs table and any associated sprocs. One alternative I can see is to have two columns, TableName and RowId. But this will have no inherent Foreign Key references since it isn't known what table is being referred to.

Thanks for any input!

A: 

Wouldn't you just need four columns in the Log table? UserId, TableName, Id, Action.

Action would be "delete", "update" or "insert", Id would be the primary key of the table in question and the rest, well, are self explanatory.

This way you don't have to have a lot of columns and foreign keys, which will only make inserting into the Log table slower. You are going to need to do this with triggers anyway, so adding the table name to the Log table will not be a problem.

klausbyskov
-1 - This wouldn't show what columns were updated, and you would lose information on any deletes
Gabriel McAdams
@Gabriel McAdams, I didn't notice that that was a requirement.
klausbyskov
A: 

Assuming the log information for each table you're tracking is identical, I would go with a single log table with TableID and RowPKValue columns. It's true that you cannot FK that RowPKValue back to each individual table, but since the log table will never be touched by users (or any code other than your Log SP) I think it's reasonably safe.

I would definitely use the PK value of the row-being-logged rather than any other value (even if you don't have an FK referencing it) because that's what PK values are for.

As for the DELETE issue, I think it depends on 1) the number of DELETEs you expect to perform and 2) the likelihood that you are going to have to feature "Undelete" as part of the application.

If you have a relatively small number of DELETEs, you can keep them in the table with a deleted flag and then remove them when some specified period is up. If you do this, I recommend a scheme in which the base table is called something like CustomerAll and you have a single table view CREATE VIEW Customer AS SELECT * FROM CustomerAll WHERE Deleted = false for the front end programmers to use.

If you have a large number of DELETEs, I would move them upon DELETE into a second table (CustomersDeleted) or even out of the database, depending on how likely you think it is you'll ever need to look them up again.

Larry Lustig
+1  A: 

First, depending on the number of rows and what kind of indexes you have, adding the deleted column might perform better then deleting the row.

Second, I think the best option is to store the entire row in the log table (this will allow any updates to be logged as well). You could also do this with a more normalized log table - something like this:

  • ID
  • Action
  • EffectedColumn
  • OldValue
  • NewValue

That would be a lot more work, though, and you wouldn't get that much out of it. I would recommend you simply store all data in a duplicate table - with the additional columns to show the kind of action that was taken.

As for the reference to the row's identifier or some other value, would would have the same problem however you did it. In order to identify what row was effected, the same unique value would have to exist in that row as well. As soon as that row is deleted, so then is that unique value (GUID (uniqueidentifier in t-sql) - is an option, by the way).

Gabriel McAdams
+1 - especially for the duplicate table suggestion. Though it might increase the tables, i think its not at all a bad option to consider!!
InSane
A: 

You can use a 2 table log file setup as well.

LogA - Log_id,Tablename, action, date_time

LogB - Log_ID,table_id,columnchanged,old_value,new_value

(edit to calrify, 'table_id' above refers to whatever is your primary key in that table. Multiple PK's may require more than one field here)

Use a null value in the new_value field to represent a delete and a null in the old value to represent an insert.

If you prefer to avoid a 'deleted' column on each table, you can create a deleted table that stores which rows were deleted and ten use a view to show only active records (IE records that don't have their unqiue key inside the deleted table).

There will be alot of valid ways of designing this...

M.E.
+1  A: 

RE: one log table instead of shadow tables.

That would be a lot more work, though, and you wouldn't get that much out of it. - Gabriel McAdams

I couldn't disagree more.

Your goal is to:

to keep track of when a user performs an Insert, Update, or Delete.

If you have a shadow table for every table how do you answer these questions:

  1. Which users made changes yesterday?
  2. What table has the most updates?
  3. What activity did employee = "Page, S" do last month?
  4. What employee is the most active?

You'll have to go through EVERY SINGLE shadow table to figure that out. And if you ever start logging a new table, you'll have to remember to change those sprocs to also hit that new shadow table.

If you have one log table as others have suggested, you can answer all of those questions very easily with simple SQL... what's a LITTLE harder to do is UN-DELETE or UN-UPDATE. But you're not building it for that, you're building it to track user edits, and the queries I listed, or ones like them will be your bread-and-butter.

Regarding Logical Deletes.

They can be useful if that table is the parent of other tables you CAN'T delete from. Like employee and payroll. You can't delete an employee with payroll records that must be kept by law. If that's why you're logically deleting, great. But if you're logically deleting so that you keep the PK to an FK in a log table then I think you're buying a world of hurt for not a lot good.

If you have ANY code hitting those tables directly, you'll now have to change them ALL to included a DELETED = 'F' predicate. If you start from scratch you can make a view for EVERY logically deleted table that embeds the DELETED = 'F' predicate and never grant select on the table itself.

Stephanie Page
I'm going to go with the single log file set up for the time being. As you mentioned, it is more important to me to know the answers to questions like the ones you listed.
rpf3