views:

435

answers:

4

I'm writing some logging/auditing code that will be running in production (not just when errors are thrown or while developing). After reading Coding Horror's experiences with dead-locking and logging, I decided I should seek advice. (Jeff's solution of "not logging" won't work for me, this is legally mandated security auditing)

Is there an suitable Isolation level for minimizing contention and dead-locking? Any query hints I can add to the insert statement or the stored procedure?

I care deeply about the transactional integrity for everything except the audit table. The idea is that so much will be logged that if a few entries fail, it's not a problem. If the logging stops a some other transaction-- that would be bad.

I can log to a database or a file, although logging to a file is less attractive because I need to be able to display the results somehow. Logging to a file would (almost) guarantee the logging wouldn't interfere with other code though.

+1  A: 

Since you don't care about the transactional integrity of the audit table, you can obviously perform logging outside of the transaction (i.e. after it completes). That will minimise impact on the transaction.

Also, if you want to minimize locking, you should try to ensure that as much of your query workload as possible has covering non-clustered indexes. (SQL Server 2005 and above, the use of the INCLUDE statement in NC indexes can make a big difference)

Mitch Wheat
Note: if you perform all the logging after a transaction completes, it makes it fairly difficult to debug long running transactions or aborted transactions.
Sam Saffron
The poster has already made it clear that it doesn't matter if the logging fails, as long as a transaction completes. Obviously, transactions should be as short as possible.
Mitch Wheat
all im saying is that in this context (i.e. in a separate thread) is probably better than (i.e. after it completes)
Sam Saffron
+1  A: 

One easy way to prevent your logging from having locking issues with your 'regular' database is to not use the same database. Just create another database for your logging. As a bonus, the rapid growth of your logging database won't result in fragmentation in your main DB. Personall, I usually prefer to log to a file -- but then again, I'm used to doing heavy text manipulation in my editor - VIM. Logging to a separate DB should help avoid deadlocking issues.

Just make sure that if you try writing your own database appender for the logging framework you use, you be very careful about your locks (which I'm guessing is what tripped up Jeff in the blog post you reference). Properly written (see several of the comments in Jeff's post), you shouldn't have locking issues with your logging framework unless they do something odd.

Jonathan
I think introducing a whole new database just for a logging table is rather aggressive, you can always control the storage location of each table using filegroups if required anyway. Also not that if you have distributed transaction in place then logging to a separate db can actually make stuff worse if that operation is participating in the distributed transaction.
Sam Saffron
If you're using distributed transactions, I'd advise *not* having your logging enlist in them. Nothing like having something break causing a transaction rollback, and not having any logs of what broke...
Jonathan
+1  A: 

A normal transaction (ie. READ COMMITTED) insert already does the 'minimal' locking. Insert intensive applications will not deadlock on the insert, no matter the order of how the insert is mixed with other operations. At worst an intensive insert system may cause page latch contention on the hot spot where insert occurs, but not deadlocks.

To cause deadlocks as described by Jeff there has to be more at play, like any one of the following:

  • The system is using a higher isolation level (they had it coming then and well deserve it)
  • They were reading from the log table during the transaction (so is no longer 'append-only')
  • The deadlock chain involved application layer locks (ie. .Net lock statements in the log4net framework) resulting in undetectable deadlocks (ie. application hangs). Given that solving the problem involved looking at process dumps, I guess this is the scenario they were having.

So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe. If you expect the same problem I suspect SO had (ie. deadlocks involving application layer locks) then no amount of database wizardry can save you, as the problem can still manifest even if you log on separate transaction or into separate connection.

Remus Rusanu
"So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe." This is not true. If you are participating in a big transaction you can block the whole transaction when trying to insert stuff the logging table (if the logging table is blocked because you are running some crazy report off it or something). Also this opens you up to deadlock central!
Sam Saffron
No, simply not true. INSERT will *not* block behind a (read committed) report, no matter what you do in the transaction. Also concurrent INSERTS will never appear in a deadlock chain, again because they don't block behind each other nor behind SELECTs. The things that can block an insert are: a range S lock that covers the insert point (ie. a serializable report), an S lock on table (ie. escalation from an repeatable report) or X locks (ie. some activity *other* than 'append only', like updates that escalated to table X lock).
Remus Rusanu
INSERT into a table (with a clustered index) in read committed acquires an IX lock on the page and and X lock on the key. If it can not acquire the IX lock on the page it will wait, if it can not acquire and X lock on the key it will wait. SELECT statements need shared locks. The locks can conflict. Deadlocks can exist. Read: http://msdn.microsoft.com/en-us/library/ms186396.aspx
Sam Saffron
S locks conflicting with any intent locks can occur only in escalation scenarios, and S escalation can occur only in high isolation levels. The X of an *insert* can conflict only with range locks (since its a *NEW* row, new key). Range locks, again, cannot occur unless higher isolation levels are used. I must emphasize again: an APPEND ONLY log will not cause deadlocks, period. As a bonus, will also not conflict with a READ COMMITTED reporting.
Remus Rusanu
+2  A: 

If you don't care about consistency on your logging table, why not perform all the logging from a separate thread.

I probably would not wait for transactions to complete before logging, since the log can be pivotal in diagnosing long running transactions. Also, this enables you to see all the work a transaction that rolled back did.

Grab the stack trace and all of your logging data in the logging thread, chuck it on a queue when there are new logging messages, flush them to the db in a single transaction.

Steps to minimizing locking:

  • (KEY) perform all appends to the logging table outside of the main thread/connection/transaction.
  • Ensure your logging table has a monotonically increasing clustered index (Eg. int identity ) that is increasing each time you append a log message. This ensures the pages being inserted into are usually in memory and avoids the performance hits you get with heap tables.
  • Perform multiple appends to the log in a transaction (10 inserts in a transaction are faster than 10 inserts out of a transaction and usually acquire/release less locks)
  • Give it a break. Only perform logging to your db every N milliseconds. Batch up bits of works.
  • If you need to report on stuff historically, you can consider partitioning your logging table. Example: You could create a new logging table every month, and at the same time have a log VIEW that is a UNION ALL of all the older logging tables. Perform the reporting against the most appropriate source.

You will get better performance by flushing multiple logging messages in a single (smallish) transaction, and have the advantage that if 10 threads are doing work and logging stuff, only a single thread is flushing stuff to the logging table. This pipelining actually makes stuff scale better.

Sam Saffron