views:

712

answers:

4

Hi all,

In my development environment, I seek to recreate a production issue we face with MSSQL 2005. This issue has two parts:

The Problem

1) A deadlock occurs and MSSQL selects one connection ("Connection X") as the 'victim'. 2) All subsequent attempts to use "Connection X" fail (we use connection pooling). MSSQL says "The server failed to resume the transaction"

Of the two, #2 if more serious: since "connection X" is whacked every "round robin" attempt to re-use "connection x" fails--and mysterious "random" errors appear to the user. We must restart the server.

Why I Write

At this point, however, I wish to recreate problem #1. I can create a deadlock easily.

But here's my issue: whereas in production, MSSQL chooses one connection (SPID) as the 'deadlock victim', in my test environment, the deadlock just hangs...and hangs and hangs. Forever? I'm not sure, but I left it hanging overnight and it still hung in the morning.

So here's the question: how can I make sql server "choose a deadlock victim" when a deadlock occurs?

Attempts so Far

I tried setting the "lock_timeout" parameter via the jdbc url ("lockTimeout=5000"), however I got a different message than in production (in test,"Lock request time out period exceeded." instead of in production "Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.")

Some details on problem #2

I've researched this "unable to resume the transaction" problem and found a few things:

  • bad exception handling may cause this problem. E.g.: the java code does not close the Statement/PreparedStatement and the driver's implementation of "Connection" is stuck with a bad/stale/old "transaction ID"
  • A jdbc driver upgrade may make the problem go away.

For now, however, I just want to recreate a deadlock and make sql server "choose a deadlock victim".

thanks in advance!

Appendix A. Technical Environment

Development:

  • sql server 2005 SP3 (9.00.4035.00)
  • driver: sqljdbc.jar version 1.0
  • Jboss 3.2.6
  • jdbc url: jdbc:sqlserver://<>;

Production:

  • sql server 2005 SP2 (9.00.3042.00)
  • driver: sqljdbc.jar version 1.0
  • Jboss 3.2.6
  • jdbc url: jdbc:sqlserver://<>;

Appendix B. Steps to force a deadlock

  • get connection A
  • get connection B
  • run sql1 with connection A
  • run sql2 with connection B
  • run sql1 with connection B
  • run sql2 with connection A

where sql1: update member set name = name + 'x' WHERE member_id = 71

sql2: update member set name = name + 'x' WHERE member_id = 72

A: 

You can specify a Deadlock priority ffor a the session using

SET DEADLOCK_PRIORITY LOW | MEDIUM | HIGH

See this MSDN link for details.

You can also use the following command to view the open transactions

DBCC OPENTRAN (db_name)

This command may help you identify what is causing the deadlock. See MSDN for more info.

John Hunter
A: 

What are the queries being run? What is actually causing the deadlock?

You say you have two connections A and B. A runs sql1 then sql2, while B runs sql2 then sql1. So, what is the work (queries) being done? More importantly, where are the transactions? What isolation level are you using? What opens/closes the transactions? (Yes, this leads to questioning the exception processing used by your drivers--if they don't detect and properly process a returned "it didn't work" message, then you absolutely need to take them out back and shoot them--bullets or penicillin, your call.)

Understanding the explicit details underlying the deadlock will allow you to recreate it. I'd first try to recreate it "below" your application -- that is, open up two windows in SSMS, and recreate the application's actions step by step, by hand if/as necessary. Once you can do this, step back and replicate that in your application--all on your development servers, of course!

(A thought--are your Dev databases copies of your Production DBs? If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but what SQL does "under the hood" will be vastly different.)

A last thought, SQL will detect and process deadlocks automatically (I really don't think you can disable this), if yours are running overnight then I don't think you have a deadlock, but rather just a conventional locking/blocking issue.

[Posting this now -- going to look something up, will check back later.]
[Later]

Interesting--SQL Server 2005 compact edition does not detect deadlocks, it only does timeouts. You're not using that in Dev, are you?

I see no way to "turn off" or otherwise control the deadlock timeout period. I hit and messed with deadlocks just last week, and some arbitrary testing then indicated that deadlocks are detected and resolved in (for our dev server) under 5 seconds. It truly seems like you don't have deadlocks on you Dev machine, just blocking. But realize that this stuff is hard for "armchair DBAs" to analyzed, you'd really need to sit down and do some serious analysis of what's going on within the system when this problem is occuring.

Philip Kelley
A: 

The explanation of why the JDBc connection enters the incorrect state is given here: The server failed to resume the transaction... Why?. You should upgrade to JDBC SQL driver v2.0 before anything else. The link also contains advice on how to fix the application processing to avoid this situation, most importantly about avoiding the mix of JDBC transaction API with native Transact-SQL transactions.

As for the deadlock repro: you did not recreate a deadlock in test. You just blocked waiting for a transaction to commit. A deadlock is a different thing and SQL Server will choose a victim, you do not have to set deadlock priority, lock timeouts or anything. Deadlock priorities are a completely different topic and are used to choose the victim in certain scenarios like high priority vs. low priority overnight batch processing.

Any deadlock investigation should start with understanding the deadlock, if you want to eliminate it. The Dedlock Graph Event Class in Profiler is the perfect starting point. With the deadlock graph info you can see what resources is the deadlock occuring on and what statements are involved. Most times the solution is either to fix the order of updates in application (always follow the same order) or fix the access path (ie. add an index).

Update

  • The UPDATE .. WHERE key IN (SELECT ...) is usually deadlocking because the operation is not atomic. Multiple threads can return the the same IN list because the SELECT part does not lock anything. This is just a guess, to properly validate you must look at the deadlock info.

  • To validate your hand made test for deadlocks you should validate that the blocking SPIDs form a loop. Look at SELECT session_id, blocking_session_id FROM sys.dm_exec_requests WHERE blocking_session_id <> 0. If the result contains a loop (eg. A blocked by B and B blocked by A) adn the server does not trigger a deadlock, that's a bug. However, what you will find is that the blocking list will not form a loop, will be something A blocked by B and B blocked by C and C not in the list, which means you have done something wrong in the repro test.

Remus Rusanu
The deadlock graph info is good, but extracting and reviewing the underlying XML (from the Text column) provides a LOT more information on the deadlock.
Philip Kelley
>What are the queries being run? What is actually causing the deadlock?In my test environment, I ran very simple queries:sql1: UPDATE principal SET name = name + '.' WHERE principal_id = 71sql2: UPDATE principal SET name = name + '.' WHERE principal_id = 72Then executed them in chiastic/criss-cross order, i.e. w/o any commits. connectionA sql1 connection B sql2 sql1 sql2In production, our 'problematic query' ("prodbad") looked liked this: UPDATE post SET lock_flag = ? WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)
A: 

[ This is a response to the answers. The UI does not allow longer 'comments' on answers]

What are the queries being run? What is actually causing the deadlock?

In my test environment, I ran very simple queries:

sql1: UPDATE principal SET name = name + '.' WHERE principal_id = 71

sql2: UPDATE principal SET name = name + '.' WHERE principal_id = 72

Then executed them in chiastic/criss-cross order, i.e. w/o any commits.

connectionA

 sql1

   connectionB

      sql2

      sql1

 sql2

This to me seems like a basic example of a deadlock. If this a "mere lock", however, and not a deadlock, please disabuse me of this notion.

In production, our 'problematic query' ("prodbad") looked liked this:

UPDATE post SET lock_flag = ? WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)

Note a few things:

1) This "prod problem query" actually works. AFAIK it had a deadlock this one time

2) I suspect that the problem lies in page locking, i.e. pessimistic locking due to reads elsewhere in the transaction

3) I do not know what sql this transaction executed prior to this query.

4 )This query is an example of "I can do that in one sql statement" processing, which while seems clever to the programmer ultimately causes much more IO than running two queries:

queryM:SELECT thread_id FROM POST WHERE post_id = ?

queryN: UPDATE post SET lock_flag = ? WHERE thread_id = <>

*>(A thought--are your Dev databases copies of your Production DBs?

If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but >what SQL does "under the hood" will be vastly different.)*

In this case the prod and dev db's differ. "Prod server" had tons of data. "Dev db" had little data. The queries were very differently. All I wanted to do was recreate a deadlock.

*> The server failed to resume the transaction... Why?. You should upgrade to JDB

C SQL driver v2.0 before anything else.*

Thanks. We plan on this change. Switching drivers introduces a little bit of risk, so we'll need to run some test..

To recap:

I had the "bright idea" to force a simple deadlock and see if my connection was "whacked/hosed/borked/etc." The deadlock, however, behaved differently than in production.