views:

7361

answers:

8

In my SQL Server backend for my app, I want to create history tables for a bunch of my key tables, which will track a history of changes to the rows.

My entire application uses Stored Procedures, there is no embedded SQL. The only connection to the database to modify these tables will be through the application and the SP interface. Traditionally, shops I've worked with have performed this task using triggers.

If I have a choice between Stored Procedures and Triggers, which is better? Which is faster?

+2  A: 

Use triggers for this. This means that any changes, regardless of source, will be reflected in the history table. It's good for security, resilient to failure modes like people forgetting to add code to update the history table and so forth.

There is not likely to be any particular speed difference in either for this type of operation as execution time will be dominated by the I/O.

ConcernedOfTunbridgeWells
But it's very difficult usually to figure out who caused the change, and what they were doing at the time.
le dorfier
You can capture session and login information in the trigger and log it in the audit tables.
ConcernedOfTunbridgeWells
Unless you have a web app where you may not know the browser user, even on an intranet
gbn
A: 

Triggers. Right now you might be able to say that the only way data is updated is through your SPs, but things can change or you might need to do a mass insert/update that using the SPs will be too cumbersome for. Go with triggers.

xando
But row-based triggers for mass updates could harm performance. Mass updates should include turning the trigger off, performing the update, performing a second mass update to do what the trigger would have done, then re-enable the trigger.
Neil Barnwell
+12  A: 

in SQL server 2008 a new feature called CDC (Change data Capture) CDC on MSDN can help. CDC is an ability to record changes to table data into another table without writing triggers or some other mechanism, Change data capture records the changes like insert, update, and delete to a table in SQL server thus making the details of the changes available in relational format.

Channel9 video

ashish jaiman
Exactly like triggers, but not called triggers.
Ian Boyd
I was looking forward to this feature of 2008 but was disappointed to learn that "Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server."
Funka
CDC doesn't seem to record dates, times, login name, host, spid, etc. And i don't know, but i'll bet that there is no GUI to manage it. And i also assume that you cannot modify a table after enabling CBC (i.e. adding, removing, renaming column)
Ian Boyd
+3  A: 

As everyone else said, Triggers. They are easier to unit test and far more resilient to power users with unexpected access directly to the tables making random queries.

As for faster? Determining what is fast inside a database is a hard problem with large number of variables. Short of "try it both ways and compare" you are not going to get a useful answer to which method is faster. The variables include the size of the tables involved, the normal pattern of updates, the speed of the disks in the server, the amount of memory, the amount of memory devoted to caching, etc. This list is endless and each variable affects whether triggers are faster than custom SQL inside the SP.

Good. Fast. Cheap. Pick two. Triggers are Good in terms of integrity and probably Cheap in terms of maintenance. Arguably they are also Fast in that once they work, you are done with them. SPs are a maintenance issue and pushing stuff into maintenance can be Fast, but is never Good or Cheap.

Good Luck.

jmucchiello
+1  A: 

Recommended approach depends on your requirements. If the history table is there for audit trail, you need to capture each operation. If history table is only for performance reasons, then a scheduled SQL Agent data transfer job should be enough.

For capturing each operation use either AFTER TRIGGERs or Change Data Capture.

After triggers provide you with two temp tables to operate with inside the trigger:

  • INSERTED after INSERT or UPDATE
  • DELETED after DELETE

You can perform inserts to the history table from these temp tables and your history table will always be up-to-date. You might want to add version numbering, time stamps or both in the history table to separate changes to a single source row.

Change Data Capture (CDC) is designed for creating a delta table that you can use as a source for loading data into a data warehouse (or a history table). Unlike triggers, CDC is asynchronous and you can use any method and scheduling for populating your destination (sprocs, SSIS).

You can access both original data and changes with CDC. Change Tracking (CT) only detects changed rows. It is possible to construct a complete audit trail with CDC but not with CT. CDC and CT are both only available in the MSSQL 2008 Enterprise and Developer Editions.

mika
+2  A: 

One issue to be very careful about is to identify your intended use cases for this table, and make sure it's constructed properly for that purpose.

Specifically, if it's for an operational audit trail for stakeholders, that's quite different from before-and-after snapshots of record changes in tables. (In fact, I have a difficult time imagining a good use for record changes, other than debugging.)

An audit trail normally requires, at minimum, a user id, a timestamp, and an operation code - and probably some detail about the operation. Example - change the ordered quantity on a line item on a purchase order.

And for this type of audit trail you do not want to use triggers. The higher in the BR layer you embed the generation of these events, the better.

OTOH, for record-level changes, triggers are the right match. But it's also often easier to get this from your dbms journaling files.

le dorfier
A: 

It depends on the nature of the application and the table structure, number of indexes, data size, etc, foreign keys, etc. If these are relatively simple tables (no or few indexes like indexes on datetime/integer columns) with a limited data set (< 1 Million rows), you will probably be ok to use triggers.

Keep in mind that triggers can be the source of locking issues. I would assume that if your are using the history tables as a type of audit trail you will be indexing them for future reference. If the trigger updates the history table which is slow to insert/update/delete due to the indexes, the procedure call will block until the trigger finishes. Also, if there are any foreign key constraints that will be updated in the trigger, this could also hamper performance.

In this case it all depends on the table indexes. We use Sql Server 2000 for a 24/7 app that processes over 100K financial transactions per day. The largest/main table has over 100Million rows and 15 indexes (mass deletes are not reasonably possible if uptime is desired). Even though all SQL is done in Stored Procedures, we do not use triggers or foreign keys because of the performance hit.

aurealus
+18  A: 

Triggers.

We wrote a GUI (internally called Red Matrix Reloaded) to allow easy creation/management of audit logging triggers.

Here's some DDL of the stuff used:


The AuditLog table

CREATE TABLE [AuditLog] (
    [AuditLogID] [int] IDENTITY (1, 1) NOT NULL ,
    [ChangeDate] [datetime] NOT NULL CONSTRAINT [DF_AuditLog_ChangeDate] DEFAULT (getdate()),
    [RowGUID] [uniqueidentifier] NOT NULL ,
    [ChangeType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [TableName] [varchar] (128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [FieldName] [varchar] (128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [OldValue] [varchar] (8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [NewValue] [varchar] (8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [Username] [varchar] (128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [Hostname] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [AppName] [varchar] (128) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    [UserGUID] [uniqueidentifier] NULL ,
    [TagGUID] [uniqueidentifier] NULL ,
    [Tag] [varchar] (8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL 
)


Trigger to log inserts

CREATE TRIGGER LogInsert_Nodes ON dbo.Nodes
FOR INSERT
AS

/* Load the saved context info UserGUID */
DECLARE @SavedUserGUID uniqueidentifier

SELECT @SavedUserGUID = CAST(context_info as uniqueidentifier)
FROM master.dbo.sysprocesses
WHERE spid = @@SPID

DECLARE @NullGUID uniqueidentifier
SELECT @NullGUID = '{00000000-0000-0000-0000-000000000000}'

IF @SavedUserGUID = @NullGUID
BEGIN
    SET @SavedUserGUID = NULL
END

    /*We dont' log individual field changes Old/New because the row is new.
    So we only have one record - INSERTED*/

    INSERT INTO AuditLog(
      ChangeDate, RowGUID, ChangeType, 
      Username, HostName, AppName,
      UserGUID, 
      TableName, FieldName, 
      TagGUID, Tag, 
      OldValue, NewValue)

    SELECT
     getdate(), --ChangeDate
     i.NodeGUID, --RowGUID
     'INSERTED', --ChangeType
     USER_NAME(), HOST_NAME(), APP_NAME(), 
     @SavedUserGUID, --UserGUID
     'Nodes', --TableName
     '', --FieldName
     i.ParentNodeGUID, --TagGUID
     i.Caption, --Tag
     null, --OldValue
     null --NewValue
    FROM Inserted i


Trigger to log Updates

CREATE TRIGGER LogUpdate_Nodes ON dbo.Nodes
FOR UPDATE AS

/* Load the saved context info UserGUID */
DECLARE @SavedUserGUID uniqueidentifier

SELECT @SavedUserGUID = CAST(context_info as uniqueidentifier)
FROM master.dbo.sysprocesses
WHERE spid = @@SPID

DECLARE @NullGUID uniqueidentifier
SELECT @NullGUID = '{00000000-0000-0000-0000-000000000000}'

IF @SavedUserGUID = @NullGUID
BEGIN
    SET @SavedUserGUID = NULL
END

    /* ParentNodeGUID uniqueidentifier */
    IF UPDATE (ParentNodeGUID)
    BEGIN
     INSERT INTO AuditLog(
      ChangeDate, RowGUID, ChangeType, 
      Username, HostName, AppName,
      UserGUID, 
      TableName, FieldName, 
      TagGUID, Tag, 
      OldValue, NewValue)
     SELECT 
      getdate(), --ChangeDate
      i.NodeGUID, --RowGUID
      'UPDATED', --ChangeType
      USER_NAME(), HOST_NAME(), APP_NAME(), 
      @SavedUserGUID, --UserGUID
      'Nodes', --TableName
      'ParentNodeGUID', --FieldName
      i.ParentNodeGUID, --TagGUID
      i.Caption, --Tag
      d.ParentNodeGUID, --OldValue
      i.ParentNodeGUID --NewValue
     FROM Inserted i
      INNER JOIN Deleted d
      ON i.NodeGUID = d.NodeGUID
     WHERE (d.ParentNodeGUID IS NULL AND i.ParentNodeGUID IS NOT NULL)
     OR (d.ParentNodeGUID IS NOT NULL AND i.ParentNodeGUID IS NULL)
     OR (d.ParentNodeGUID <> i.ParentNodeGUID)
    END

    /* Caption varchar(255) */
    IF UPDATE (Caption)
    BEGIN
     INSERT INTO AuditLog(
      ChangeDate, RowGUID, ChangeType, 
      Username, HostName, AppName,
      UserGUID, 
      TableName, FieldName, 
      TagGUID, Tag, 
      OldValue, NewValue)
     SELECT 
      getdate(), --ChangeDate
      i.NodeGUID, --RowGUID
      'UPDATED', --ChangeType
      USER_NAME(), HOST_NAME(), APP_NAME(), 
      @SavedUserGUID, --UserGUID
      'Nodes', --TableName
      'Caption', --FieldName
      i.ParentNodeGUID, --TagGUID
      i.Caption, --Tag
      d.Caption, --OldValue
      i.Caption --NewValue
     FROM Inserted i
      INNER JOIN Deleted d
      ON i.NodeGUID = d.NodeGUID
     WHERE (d.Caption IS NULL AND i.Caption IS NOT NULL)
     OR (d.Caption IS NOT NULL AND i.Caption IS NULL)
     OR (d.Caption <> i.Caption)
    END

...

/* ImageGUID uniqueidentifier */
IF UPDATE (ImageGUID)
BEGIN
 INSERT INTO AuditLog(
  ChangeDate, RowGUID, ChangeType, 
  Username, HostName, AppName,
  UserGUID, 
  TableName, FieldName, 
  TagGUID, Tag, 
  OldValue, NewValue)
 SELECT 
  getdate(), --ChangeDate
  i.NodeGUID, --RowGUID
  'UPDATED', --ChangeType
  USER_NAME(), HOST_NAME(), APP_NAME(), 
  @SavedUserGUID, --UserGUID
  'Nodes', --TableName
  'ImageGUID', --FieldName
  i.ParentNodeGUID, --TagGUID
  i.Caption, --Tag
  (SELECT Caption FROM Nodes WHERE NodeGUID = d.ImageGUID), --OldValue
  (SELECT Caption FROM Nodes WHERE NodeGUID = i.ImageGUID) --New Value
 FROM Inserted i
  INNER JOIN Deleted d
  ON i.NodeGUID = d.NodeGUID
 WHERE (d.ImageGUID IS NULL AND i.ImageGUID IS NOT NULL)
 OR (d.ImageGUID IS NOT NULL AND i.ImageGUID IS NULL)
 OR (d.ImageGUID <> i.ImageGUID)
END


Trigger to log Delete

CREATE TRIGGER LogDelete_Nodes ON dbo.Nodes
FOR DELETE
AS

/* Load the saved context info UserGUID */
DECLARE @SavedUserGUID uniqueidentifier

SELECT @SavedUserGUID = CAST(context_info as uniqueidentifier)
FROM master.dbo.sysprocesses
WHERE spid = @@SPID

DECLARE @NullGUID uniqueidentifier
SELECT @NullGUID = '{00000000-0000-0000-0000-000000000000}'

IF @SavedUserGUID = @NullGUID
BEGIN
    SET @SavedUserGUID = NULL
END

    /*We dont' log individual field changes Old/New because the row is new.
    So we only have one record - DELETED*/

    INSERT INTO AuditLog(
      ChangeDate, RowGUID, ChangeType, 
      Username, HostName, AppName,
      UserGUID, 
      TableName, FieldName, 
      TagGUID, Tag, 
      OldValue,NewValue)

    SELECT
     getdate(), --ChangeDate
     d.NodeGUID, --RowGUID
     'DELETED', --ChangeType
     USER_NAME(), HOST_NAME(), APP_NAME(), 
     @SavedUserGUID, --UserGUID
     'Nodes', --TableName
     '', --FieldName
     d.ParentNodeGUID, --TagGUID
     d.Caption, --Tag
     null, --OldValue
     null --NewValue
    FROM Deleted d


And in order to know which user in the software did the update, every connection "logs itself onto SQL Server" by calling a stored procedure:

CREATE PROCEDURE dbo.SaveContextUserGUID @UserGUID uniqueidentifier AS

/* Saves the given UserGUID as the session's "Context Information" */
IF @UserGUID IS NULL
BEGIN
    PRINT 'Emptying CONTEXT_INFO because of null @UserGUID'
    DECLARE @BinVar varbinary(128)
    SET @BinVar = CAST( REPLICATE( 0x00, 128 ) AS varbinary(128) )
    SET CONTEXT_INFO @BinVar
    RETURN 0
END

DECLARE @UserGUIDBinary binary(16) --a guid is 16 bytes
SELECT @UserGUIDBinary = CAST(@UserGUID as binary(16))
SET CONTEXT_INFO @UserGUIDBinary


/* To load the guid back 
DECLARE @SavedUserGUID uniqueidentifier

SELECT @SavedUserGUID = CAST(context_info as uniqueidentifier)
FROM master.dbo.sysprocesses
WHERE spid = @@SPID

select @SavedUserGUID AS UserGUID
*/


Notes

  • Stackoverflow code format removes most blank lines - so formatting sucks
  • We use a table of users, not integrated security
  • This code is provided as a convience - no critisism of our design selection allowed. Purists might insist that all logging code should be done in the business layer - they can come here and write/maintain it for us.
  • blobs cannot be logged using triggers in SQL Server (there is no "before" version of a blog - there is only what is). Text and nText are blobs - which makes notes either unloggable, or makes them varchar(2000)'s.
  • the Tag column is used as an arbitrary text to identify the row (e.g. if a customer was deleted, the tag will show "General Motors North America" in the audit log table.
  • TagGUID is used to point to the row's "parent". For example logging InvoiceLineItems points back to the InvoiceHeader. This way anyone searching for audit log entries related for a specific invoice will find the deleted "line items" by the line item's TagGUID in the audit trail.
  • sometimes the "OldValue" and "NewValue" values are written as a sub-select - to get a meaningful string. i.e."

    OldValue: {233d-ad34234..} NewValue: {883-sdf34...}

is less useful in the audit trail than:

OldValue: Daimler Chrysler
NewValue: Cerberus Capital Management

Final note: Feel free to not do what we do. This is great for us, but everyone else is free to not use it.

Ian Boyd
This example was extremely helpful. Thanks a bunch.
EndangeredMassa
Soooooo...if you liked it, can it be the answer?
Ian Boyd