views:

3294

answers:

8

I am debugging an application that I suspect is getting deadlocked and hanging. However, this only occurs every few days, and it never happens on my computer so I can't hook a debugger up to it. Are there any utilities or methods I can use to query the running application and find out what methods/locks/whatever it is deadlocked on?

Update: Typically the application is running at a customer location and I don't have access to the machine, and I'm not entirely comfortable asking them to install tons of software.

+1  A: 

I would recommend http://www.jetbrains.com/profiler/

madcolor
Thanks - I have just posted an update to the question. Also, I don't see anything about deadlock detection with this profiler - did I miss something?
Jon Tackabury
A: 

The end of http://blogs.technet.com/askperf/archive/2007/06/15/capturing-application-crash-dumps.aspx says that on Vista at least you can get a crash dump of a running process using Task Manager.

ChrisW
A: 

This is a very interesting problem and a pain because it only happens every few days. I found this article on CodeProject. It might be a start for you.

An old school approach is to log a ton of messages and use logfiles to try to detect when it occurs. :)

siz
+1  A: 

You actually have a very interesting problem over there. There are several thing you can do:

Use a good logger: One of the way to reproduce a multi thread error is to have a logger that will print the actions taken and the thread that performed them, that way you can find a trace the guides you to the error. This is a fairly easy solution if you can add the logger.

Use FSP: Define your multi threaded system using FSP. This way you will be able to create a finite state machine of the process which you can walk through to find the error. This solution is a more mathematical solution.

The two solution/procedures I give you are exactly the main differences of approaching multi threaded development between some British universitis and the Amercian ones. In the U.K. professors are more kind to try and proof their system has no errors using FSP before they program it, and the Americans prefer to test to proof they work correctly, is a matter of taste.

I really recommend to read this book: Jeff Magee and Jeff Kramer: Concurrency: State Models and Java Programs, Wiley, 1999

mandel
+4  A: 

You can use WinDbg to inspect the threads in the application. Here's a brief plan of what you could do.

  • When the application hangs, copy the WinDbg files to the machine.
  • Either attach WinDbg to the process or use ADPlus to get a hang dump of the process. If you choose ADPlus, you then load the dump in WinDbg.
  • From WinDbg you load sos.dll, so you can inspect managed code.
  • The !threads command will show you all threads in the application and the !clrstack command, will show you what they are doing. Use ~e!clrstack to dump the call stack of all threads. Look for calls to Wait methods as they indicate locking.
  • The !syncblk command will give you information of what threads are holding the different locks.
  • To find out what lock a given thread is trying to acquire, switch to the thread and inspect stack objects (!dso). From here you should be able to find the lock the thread is trying to acquire.

Clarification: WinDbg doesn't require a regular install. Just copy the files. Also, if you take the hang dump, you can continue debugging on another machine if so desired.

Brian Rasmussen
Excellent - I'll give this a try next time it happens.
Jon Tackabury
+4  A: 

Instead of using the regular lock & Monitor.Enter approach to lock some data, you can also use a 'TimedLock' structure. This TimedLock throws an exception if the lock couldn't be acquired in a timely fashion, and it can also give you a warning if you have some locks that you didn't release.

This article by Ian Griffiths could maybe help.

Frederik Gheysels
I'm going to give this a try first, and add a bunch of logging to try and track down the failed locks. Thanks!
Jon Tackabury
There are potential problems with TimedLock as this article shows:http://blogs.microsoft.co.il/blogs/sasha/archive/2009/01/27/why-concurrency-is-hard-or-timedlock-can-get-you-in-trouble.aspx
Tomer Pintel
A: 

In addition to the answers here, another thing that you would find useful with thread programming in general is to make sure your dev box is a multiprocessor machine, deadlocks in particular are (usually) much more reliably reproduced.

Tim Jarvis
A: 

Timeouts in concurrent programming is a horrible idea. This leads to non-determinism and thus behaviour that can't be reproduced. Try using a deadlock detection tool like CHESS. Better yet, minimize the number of locks used with lock-free algorithms, or eschew locks entirely and partition your program into single-threaded compartments and use queues to pass data between compartments (better known as message-passing/actor concurrency).

naasking