I'm trying to troubleshoot a COM+ application that deadlocks intermittently. The last time it locked up, I was able to take a usermode dump of the dllhost process and analyze it using WinDbg. After inspecting all the threads and locks, it all boils down to a critical section owned by this thread:
ChildEBP RetAddr Args to Child
0deefd00 7c822114 77e6bb08 000004d4 00000000 ntdll!KiFastSystemCallRet
0deefd04 77e6bb08 000004d4 00000000 0deefd48 ntdll!ZwWaitForSingleObject+0xc
0deefd74 77e6ba72 000004d4 00002710 00000000 kernel32!WaitForSingleObjectEx+0xac
0deefd88 75bb22b9 000004d4 00002710 00000000 kernel32!WaitForSingleObject+0x12
0deeffb8 77e660b9 000a5cc0 00000000 00000000 comsvcs!PingThread+0xf6
0deeffec 00000000 75bb21f1 000a5cc0 00000000 kernel32!BaseThreadStart+0x34
The object it's waiting on is an event:
0:016> !handle 4d4 f
Handle 000004d4
Type Event
Attributes 0
GrantedAccess 0x1f0003:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState,ModifyState
HandleCount 2
PointerCount 4
Name <none>
No object specific information available
As far as I can tell, the event never gets signaled, causing the thread to hang and hold up several other threads in the process. Does anyone have any suggestions for next steps in figuring out what's going on?
Now, seeing as the method is called PingThread, is it possible that it's trying to ping another thread in the process that's already deadlocked?
UPDATE
This actually turned out to be a bug in the Oracle 10.2.0.1 client. Although, I'm still interested in ideas on how I could have figured this out without finding the bug in Oracle's bug database.