views:

175

answers:

3

We are working on a large Java program that was converted from a Forte application. During the day we are getting Blocking SPID's in the server. We had a DBA visit yesterday and he set up a profile template to run to catch the locking/blocking action. When we run this profile the blocking problem goes away. Why?

This application is distributed using RMI and has around 70 users. We are using SQL 2000 and windows 2000 servers to keep compatibility with a bunch of old VB helper applications.

We have traced the blocking down to a specific screen and stored procedure but now we can't get the errors to happen with profiler running.

Thanks for any help!

Theo

+1  A: 

The good old Heisenberg debugger problem.

Any profiler does two things: it adds code in place to invoke the debugger, and it stores data. The first one can thward optimizers, and the second can change the timing of something, causing a race condition to go away.

This blocking SPID problem seems to show up on Google a lot; the reason appears to be that it occurs when some resource is locked when another one wants it, so the timing error sounds likely.

Microsoft has an article on how to deal with the problem.

Charlie Martin
Well, a DBA is most likely to be adding a SQL trace - so that doesn't change code; but yes, the overall execution can change.
Marc Gravell
A: 

Just a collection of random thoughts.. I've seen traces take a server down but never make things better.

What trace template are you using? (These are taken from SQL Server 2005 tools, sorry)

  • The "Standard (default)" one tracks high levels calls and logon/logout
  • The "TSQL_SPs" tracks statement calls which would be a lot more intrusive

Is it binary and guaranteed too? Trace on= no blocks, trace off = blocks, or is it unlucky coincidence? When you're all watching the DBA does someone stop clicking in the client and come to watch?

Is something else being switched off as part of the trace. That is, are you using profiler or a scripted trace (lots of sp_trace_set% statements)?. In a scripted trace, there may be something that switches something else off.

gbn
A: 

Sorry for not using the comments but I don't have the required reputation to comment.

Thanks for the article Charlie. It was more pertinant than the SQL2000-2005 one I was using.

GBN: I'm using the Standard trace with 3 locking events added.

I am now running the system in production using profiler and sending the log to a table limmited to 250,000 records. We have been up today with no blocks at all. If I turn the profiler off I will see blocks in just a few minutes. Very strange.

We are scheduling some downtime tomorrow and we will try a few things to help diagnose. Any ideas would help.

Thanks for the help guys!

Theo