views:

635

answers:

2

Our Windows app is often hanging in memory and I'm trying to use windbg to track down the problem. I'm very new to windbg and could use some advice (I have started to read Advanced Windows Debugging though).

The app is a mix of C++ and COM objects written in VB. Occasionally when you exit, the app appears to go away but task manager shows it hanging around in memory, apparently idle.

!threads shows me this:

ThreadCount: 2
UnstartedThread: 0
BackgroundThread: 2
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
   0    1 175c 001aa040      4220 Enabled  09131b78:09131fe8 001a2b80     0 STA
   6    2 143c 001b4b48      b220 Enabled  00000000:00000000 001a2b80     0 MTA (Finalizer)

To my untrained eye, it looks like it is being kept alive by the finalize queue being blocked by a single-threaded apartment. Does this seem reasonable?

~0kb yields:

ntdll!KiFastSystemCallRet
user32!NtUserGetMessage+0xc
mfc80!AfxInternalPumpMessage+0x18 [f:\sp\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 153]
mfc80!CWinThread::Run+0x54 [f:\sp\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 625]
mfc80!AfxWinMain+0x69 [f:\sp\vctools\vc7libs\ship\atlmfc\src\mfc\winmain.cpp @ 47]
WARNING: Stack unwind information not available. Following frames may be wrong.
OurApp+0x7e8274
kernel32!BaseProcessStart+0x23

~6kb yields:

ntdll!KiFastSystemCallRet
ntdll!ZwWaitForMultipleObjects+0xc
kernel32!WaitForMultipleObjectsEx+0x12c
kernel32!WaitForMultipleObjects+0x18
mscorwks!WKS::WaitForFinalizerEvent+0x7a
mscorwks!WKS::GCHeap::FinalizerThreadWorker+0x75
mscorwks!Thread::UserResumeThread+0xfb
mscorwks!Thread::DoADCallBack+0x355
mscorwks!Thread::DoADCallBack+0x541
mscorwks!ManagedThreadBase_NoADTransition+0x32
mscorwks!ManagedThreadBase::FinalizerBase+0xb
mscorwks!WKS::GCHeap::FinalizerThreadStart+0xbb
mscorwks!Thread::intermediateThreadProc+0x49
kernel32!BaseThreadStart+0x37

I would appreciate a little course correction here. If my guess of a blocked finalizer seems reasonable, please let me know. I would also be very happy to get some advice on figuring out what exactly is blocking.

Edit:

Shane asked for the output from !analyze. This is actually from a different dump -- I have lots of them and they all look pretty much the same.

FAULTING_IP: 
+18a952f00ebdf74
00000000 ??              ???

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
   ExceptionCode: 80000007 (Wake debugger)
  ExceptionFlags: 00000000
NumberParameters: 0

BUGCHECK_STR:  80000007

PROCESS_NAME:  OurApp.exe

OVERLAPPED_MODULE: Address regions for 'OurApp' and 'Unknown_Module_00350062' overlap

ERROR_CODE: (NTSTATUS) 0x80000007 - {Kernel Debugger Awakened}  the system debugger was awakened by an interrupt.

EXCEPTION_CODE: (HRESULT) 0x80000007 (2147483655) - Operation aborted

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x4490 (0)
Current frame: 
ChildEBP RetAddr  Caller,Callee

DERIVED_WAIT_CHAIN:  

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   0   48c8.4490 Speculated (Triage)    -->
   5   48c8.4b74 Event                  

WAIT_CHAIN_COMMAND:  ~0s;k;;~5s;k;;

BLOCKING_THREAD:  00004b74

DEFAULT_BUCKET_ID:  APPLICATION_HANG_BlockedOn_EventHandle

PRIMARY_PROBLEM_CLASS:  APPLICATION_HANG_BlockedOn_EventHandle

LAST_CONTROL_TRANSFER:  from 7c90df4a to 7c90e514

FAULTING_THREAD:  00000005

STACK_TEXT:  
0882fcd0 7c90df4a 7c809590 00000002 0882fcfc ntdll!KiFastSystemCallRet
0882fcd4 7c809590 00000002 0882fcfc 00000001 ntdll!ZwWaitForMultipleObjects+0xc
0882fd70 7c80a115 00000002 7a3b8d28 00000000 kernel32!WaitForMultipleObjectsEx+0x12c
0882fd8c 79f92c5b 00000002 7a3b8d28 00000000 kernel32!WaitForMultipleObjects+0x18
0882fdac 79f970b8 001b1ad8 0882feb0 001a0b18 mscorwks!WKS::WaitForFinalizerEvent+0x77
0882fdc0 79e984cf 0882feb0 00000000 00000000 mscorwks!WKS::GCHeap::FinalizerThreadWorker+0x49
0882fdd4 79e9846b 0882feb0 0882fe5c 79f7762b mscorwks!Thread::DoADCallBack+0x32a
0882fe68 79e98391 0882feb0 9f3f02e2 00000000 mscorwks!Thread::ShouldChangeAbortToUnload+0xe3
0882fea4 79eef74c 0882feb0 00000000 001a86c0 mscorwks!Thread::ShouldChangeAbortToUnload+0x30a
0882fecc 79eef75d 79f9706d 00000008 0882ff14 mscorwks!ManagedThreadBase_NoADTransition+0x32
0882fedc 79f3c6bc 79f9706d 9f3f0352 00000000 mscorwks!ManagedThreadBase::FinalizerBase+0xd
0882ff14 79f920a5 00000000 86fb6620 804fb078 mscorwks!WKS::GCHeap::FinalizerThreadStart+0xbb
0882ffb4 7c80b729 001a0b18 00730074 00610020 mscorwks!Thread::intermediateThreadProc+0x49
0882ffec 00000000 79f9205f 001a0b18 00000000 kernel32!BaseThreadStart+0x37


FOLLOWUP_IP: 
mscorwks!WKS::WaitForFinalizerEvent+77
79f92c5b 85c0            test    eax,eax

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  mscorwks!WKS::WaitForFinalizerEvent+77

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: mscorwks

IMAGE_NAME:  mscorwks.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  492b82c1

STACK_COMMAND:  ~5s ; kb

BUCKET_ID:  80000007_mscorwks!WKS::WaitForFinalizerEvent+77

FAILURE_BUCKET_ID:  APPLICATION_HANG_BlockedOn_EventHandle_80000007_mscorwks.dll!WKS::WaitForFinalizerEvent

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/OurApp_exe/6_2_6_1/4a29a184/unknown/0_0_0_0/bbbbbbb4/80000007/00000000.htm?Retriage=1

Followup: MachineOwner
---------

0:000> !threads
ThreadCount: 2
UnstartedThread: 0
BackgroundThread: 2
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
   0    1 4490 0019de20      4220 Enabled  09003658:09003fe8 001a86c0     0 STA
   5    2 4b74 001b1b08      b220 Enabled  00000000:00000000 001a86c0     0 MTA (Finalizer)
+1  A: 

The finalizer thread is idle and is waiting for work -- its trace looks fine. Theread 0 also looks fine and is idle -- it waits for the next UI message.

Can you give some details on how you 'exit' the application? Given that the message loop is still running, it seems to me that something is wrong with your close-application logic.

Johannes Passing
It's an MFC Windows app. It's been a while since I delved into the code, but I believe the app ends by posting either a SC_CLOSE or WM_QUIT message. I'm guessing it's locked because there is some OLE object still in memory and AfxOleCanExitApp() is returning false because the object count is > 0. I think that if I could work windbg better, I would be able to locate the leaked objects.
criddell
A: 

I agree with J. Passing.

Since one thread is managed code, have you tried loading the SOS debug extension in windbg to get the managed stack trace. Also you could try windbg's "!analyze -v" command ans see what that says.

Shane Powell
Shane, I've pasted the output from the analyze command above. Thanks for looking at this.
criddell
It's pretty much confirming what the other post was saying. Can you also post the output from '!clrstack' after loading the sos extension. That may help a little. Also checkout http://blogs.msdn.com/tess/archive/2007/12/12/automated-net-hang-analysis.aspx
Shane Powell