ansaurus

Question

sql server 2005 deadlock times out in production, not in test environment: why?

Answer 1

A:

You can specify a Deadlock priority ffor a the session using

SET DEADLOCK_PRIORITY LOW | MEDIUM | HIGH

See this MSDN link for details.

You can also use the following command to view the open transactions

DBCC OPENTRAN (db_name)

This command may help you identify what is causing the deadlock. See MSDN for more info.

John Hunter 2009-08-27 15:29:41

Answer 2

A:

What are the queries being run? What is actually causing the deadlock?

You say you have two connections A and B. A runs sql1 then sql2, while B runs sql2 then sql1. So, what is the work (queries) being done? More importantly, where are the transactions? What isolation level are you using? What opens/closes the transactions? (Yes, this leads to questioning the exception processing used by your drivers--if they don't detect and properly process a returned "it didn't work" message, then you absolutely need to take them out back and shoot them--bullets or penicillin, your call.)

Understanding the explicit details underlying the deadlock will allow you to recreate it. I'd first try to recreate it "below" your application -- that is, open up two windows in SSMS, and recreate the application's actions step by step, by hand if/as necessary. Once you can do this, step back and replicate that in your application--all on your development servers, of course!

(A thought--are your Dev databases copies of your Production DBs? If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but what SQL does "under the hood" will be vastly different.)

A last thought, SQL will detect and process deadlocks automatically (I really don't think you can disable this), if yours are running overnight then I don't think you have a deadlock, but rather just a conventional locking/blocking issue.

[Posting this now -- going to look something up, will check back later.]
[Later]

Interesting--SQL Server 2005 compact edition does not detect deadlocks, it only does timeouts. You're not using that in Dev, are you?

I see no way to "turn off" or otherwise control the deadlock timeout period. I hit and messed with deadlocks just last week, and some arbitrary testing then indicated that deadlocks are detected and resolved in (for our dev server) under 5 seconds. It truly seems like you don't have deadlocks on you Dev machine, just blocking. But realize that this stuff is hard for "armchair DBAs" to analyzed, you'd really need to sit down and do some serious analysis of what's going on within the system when this problem is occuring.

Philip Kelley 2009-08-27 15:47:01

Answer 3

A:

The explanation of why the JDBc connection enters the incorrect state is given here: The server failed to resume the transaction... Why?. You should upgrade to JDBC SQL driver v2.0 before anything else. The link also contains advice on how to fix the application processing to avoid this situation, most importantly about avoiding the mix of JDBC transaction API with native Transact-SQL transactions.

As for the deadlock repro: you did not recreate a deadlock in test. You just blocked waiting for a transaction to commit. A deadlock is a different thing and SQL Server will choose a victim, you do not have to set deadlock priority, lock timeouts or anything. Deadlock priorities are a completely different topic and are used to choose the victim in certain scenarios like high priority vs. low priority overnight batch processing.

Any deadlock investigation should start with understanding the deadlock, if you want to eliminate it. The Dedlock Graph Event Class in Profiler is the perfect starting point. With the deadlock graph info you can see what resources is the deadlock occuring on and what statements are involved. Most times the solution is either to fix the order of updates in application (always follow the same order) or fix the access path (ie. add an index).

Update

The UPDATE .. WHERE key IN (SELECT ...) is usually deadlocking because the operation is not atomic. Multiple threads can return the the same IN list because the SELECT part does not lock anything. This is just a guess, to properly validate you must look at the deadlock info.
To validate your hand made test for deadlocks you should validate that the blocking SPIDs form a loop. Look at SELECT session_id, blocking_session_id FROM sys.dm_exec_requests WHERE blocking_session_id <> 0. If the result contains a loop (eg. A blocked by B and B blocked by A) adn the server does not trigger a deadlock, that's a bug. However, what you will find is that the blocking list will not form a loop, will be something A blocked by B and B blocked by C and C not in the list, which means you have done something wrong in the repro test.

Remus Rusanu 2009-08-27 16:41:27

The deadlock graph info is good, but extracting and reviewing the underlying XML (from the Text column) provides a LOT more information on the deadlock.

Philip Kelley 2009-08-27 16:45:20

>What are the queries being run? What is actually causing the deadlock?In my test environment, I ran very simple queries:sql1: UPDATE principal SET name = name + '.' WHERE principal_id = 71sql2: UPDATE principal SET name = name + '.' WHERE principal_id = 72Then executed them in chiastic/criss-cross order, i.e. w/o any commits. connectionA sql1 connection B sql2 sql1 sql2In production, our 'problematic query' ("prodbad") looked liked this: UPDATE post SET lock_flag = ? WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)

2009-08-28 16:47:09

Answer 4

A:

[ This is a response to the answers. The UI does not allow longer 'comments' on answers]

What are the queries being run? What is actually causing the deadlock?

In my test environment, I ran very simple queries:

sql1: UPDATE principal SET name = name + '.' WHERE principal_id = 71

sql2: UPDATE principal SET name = name + '.' WHERE principal_id = 72

Then executed them in chiastic/criss-cross order, i.e. w/o any commits.

connectionA

 sql1

   connectionB

      sql2

      sql1

 sql2

This to me seems like a basic example of a deadlock. If this a "mere lock", however, and not a deadlock, please disabuse me of this notion.

In production, our 'problematic query' ("prodbad") looked liked this:

UPDATE post SET lock_flag = ? WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)

Note a few things:

1) This "prod problem query" actually works. AFAIK it had a deadlock this one time

2) I suspect that the problem lies in page locking, i.e. pessimistic locking due to reads elsewhere in the transaction

3) I do not know what sql this transaction executed prior to this query.

4 )This query is an example of "I can do that in one sql statement" processing, which while seems clever to the programmer ultimately causes much more IO than running two queries:

queryM:SELECT thread_id FROM POST WHERE post_id = ?

queryN: UPDATE post SET lock_flag = ? WHERE thread_id = <>

*>(A thought--are your Dev databases copies of your Production DBs?

If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but >what SQL does "under the hood" will be vastly different.)*

In this case the prod and dev db's differ. "Prod server" had tons of data. "Dev db" had little data. The queries were very differently. All I wanted to do was recreate a deadlock.

*> The server failed to resume the transaction... Why?. You should upgrade to JDB

C SQL driver v2.0 before anything else.*

Thanks. We plan on this change. Switching drivers introduces a little bit of risk, so we'll need to run some test..

To recap:

I had the "bright idea" to force a simple deadlock and see if my connection was "whacked/hosed/borked/etc." The deadlock, however, behaved differently than in production.

2009-08-28 17:00:26

ansaurus

tags:

views:

answers:

sql server 2005 deadlock times out in production, not in test environment: why?

related questions