Hi,
I have encountered a strange situation where starting an application pool from a windows service (written in C#, set to "Automatic" startup) using WMI or ADSI immediately after the server reboots hangs.
I'll describe the issue:
We are developing a large application (Windows 2003 Server SP2, IIS 6.0) which contains the following main processes (these processes are invoked & initialized using a windows service startup procedure when the application is started):
1) XServer1.exe, XServer2.exe - These processes are native COM-Exe servers, contains some logics, but mainly supplies COM objects to other processes via DCOM (mainly .NET2COM interOp calls & pure COM calls). For example, some of the classic ASP "Application Scope static objects" (w3wp.exe) are COM objects which "live" inside these processes.
2) dllhost.exe - this is a COM+ application. Some of our DLLs are loaded into this process which acts as a "state server" (the same idea as the ASP.NET out-of-proc sessions server, but for classic ASP pages).
3) 3 different IIS application pools (we'll call them appPool1\2\3) - containers of our ASP pages, ASP.NET pages, WCF services etc. Code (native C++ COM dlls & C#) in these application pools (w3wp.exe's) usually makes DCOM calls to the processes described in (1) & (2). Only appPool1 can be configured as a Web Garden.
In order to Start\Stop our application we have written a windows Service (C#) which controls these procedures. Our service process is called XWinService.exe. The service depends on the following windows services (the list began with the first 4 services, ongoing tries made the list like this...):
W3SVC
aspnet_state
COMSysApp
DcomLaunch
winmgmt
lanmanserver
lanmanworkstation
seclogon
Browser
TermService
The summary of the Stop procedure of the application (implemented by the service):
1) Stop all 3 IIS application pools (appPool1\2\3) - This is done to prevent w3wp.exe processes to jump alive when the application is shut-down. This is implemented with WMI from C# (system.Management.dll)
2) Stop XServer1\2.exe
3) Stop the COM+ application (dllhost.exe).
The summary of the Start procedure of the application (implemented by the service):
1) Execute the Stop procedure - This ensures that no HTTP hits will wake a w3wp.exe process before it's time.
2) Invokes & Initializes the XServer1\2.exe COM-Exe servers - Initialization is required prior to any w3wp.exe invocation. Only after some object had been initialized, w3wp.exe's can access these servers. This is implemented by .NET2COM InterOp (eventually DCOM).
3) Invokes & initialized the dllhost.exe (COM+ application) process - This is implemented by the ComAdmin Catalog API (C#).
4) Starts our 3 application pools - This allows incoming HTTP hits to wake w3wp.exe processes and start serving requests.
This is the C# code which is responsible to start\stop application pools (WMI). This code runs in our service processes (XWinService.exe):
ConnectionOptions co = new ConnectionOptions();
ManagementScope scope = new ManagementScope(@"\\localhost\root\MicrosoftIISV2", co);
foreach (string appPool in AppPools)
{
string objPath = string.Format("IISApplicationPool.Name='W3SVC/AppPools/{0}'", appPool);
using (ManagementObject mc = new ManagementObject(objPath))
{
mc.Scope = scope;
if (Operation.ToLower() == "start")
{
mc.InvokeMethod("Start", null, null); // ### The problematic line of code ###
}
else if (Operation.ToLower() == "stop")
{
mc.InvokeMethod("Stop", null, null);
}
else if (Operation.ToLower() == "recycle")
{
mc.InvokeMethod("Recycle", null, null);
}
}
}
Now the issue:
Prior to rebooting the server, starting the service manually (from the services.msc tool) succeeds without any problems. also, stopping it is OK. We have set the service to start "Automatic", that is, will start when the server (Win2K3 SP2) starts and rebooted the server. When the server started (the login screen appeared), our service was "stuck" (status = "Starting") and will NEVER (it hang for 2 days!) start.
Analyzing the processes reveled the following:
1) The XWinService.exe process was stuck on the problematic line of code (### above ###). This hanged for 2 days until we killed the process. Please note: Shutting down the application pools (the Start procedure begins with a Stop procedure) did not hang!
2) From a DUMP file taken (with DebugDiag tool) from XWinService.exe during this "hang" we can see the thread which is waiting. This is the (native) stack trace of it:
Thread 6 - System ID 2784
Entry point mscorwks!Thread::intermediateThreadProc
Create time 11/19/2009 1:40:05 PM
Time spent in user mode 0 Days 00:00:00.078
Time spent in kernel mode 0 Days 00:00:00.781
This thread is making a COM call to multi-threaded apartment (MTA) in process 884
Function Source
ntdll!KiFastSystemCallRet
ntdll!NtRequestWaitReplyPort+c
rpcrt4!LRPC_CCALL::SendReceive+230
rpcrt4!I_RpcSendReceive+24
ole32!ThreadSendReceive+138
ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+112
ole32!CRpcChannelBuffer::SendReceive2+d3
ole32!CAptRpcChnl::SendReceive+ab
ole32!CCtxComChnl::SendReceive+1a9
rpcrt4!NdrProxySendReceive+43
rpcrt4!NdrClientCall2+206
rpcrt4!ObjectStublessClient+8b
rpcrt4!ObjectStubless+f
….
This thread is calling (via DCOM) a component in process 884, which is svchost.exe, running the following services: AeLookupSvc, AudioSrv, Browser, CryptSvc, dmserver, EventSystem, helpsvc, lanmanserver, lanmanworkstation, Schedule, seclogon, SENS, ShellHWDetection, TrkWks, winmgmt, wuauserv, WZCSVC.
As you can see the "winmgmt" service (responsible for WMI) is running in this process and our service depends on it, so our service will start after winmgmt is started (the same for IIS W3SVC service).
The svchost.exe process (884) was dumped and we can see a thread (waiting for a DCOM call to end) accessing process 2880 which is - wmiprvse.exe (I guess this is the WMI server. Don't know if it's relevent, but there were 2 instances of this process). This is the native call stack of the thread (in svchost.exe):
Thread 48 - System ID 3816
Entry point wbemcore!CCoreQueue::_ThreadEntry
Create time 11/19/2009 1:40:56 PM
Time spent in user mode 0 Days 00:00:00.00
Time spent in kernel mode 0 Days 00:00:00.00
This thread is making a COM call to multi-threaded apartment (MTA) in process 2880
Function Source
ntdll!KiFastSystemCallRet
ntdll!NtRequestWaitReplyPort+c
rpcrt4!LRPC_CCALL::SendReceive+230
rpcrt4!I_RpcSendReceive+24
ole32!ThreadSendReceive+138
ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+112
ole32!CRpcChannelBuffer::SendReceive2+d3
ole32!CAptRpcChnl::SendReceive+ab
ole32!CCtxComChnl::SendReceive+1a9
…
3) Setting our service to "Manual" and starting it (manually - after logging into the server or starting it remotely from a different server immediately after reboot) is OK - nothing hangs.
4) We deleted our service (from the registry!) and placed a batch file in the windows "startup" folder. This batch files calls the service's code, but runs it as a normal C# executable. After server reboot, it also hang on the same problematic line of code (again... for 2 days until we killed it).
5) Using ADSI (System.DirectoryServices) instead of WMI had the same results (starting the application pools hanged!).
We have been digging into this for the past 2 weeks...
My questions:
==========
1) Did anyone encounter the same issue?
2) Does anyone know why it hangs? Is there any additional service dependency we should take in mind?
3) Does anyone have a solution for this issue?
4) Why is this happening after a reboot only when the service to set to "Automatic" startup? If we do it manually - everything is Ok!
Thanks in advanced,
Omri
* Small update:**
We have noticed that on VMs (VMware stations) the service hangs after reboot for an average of ~40min, until it starts (note: it never fails to start, but 40min is way too much). An event log message is recorded in the system event log stating that our service hanged for more than 16min (source: Service Control Manager, Event ID: 7044).
On "regular" machines (real metals) the average time until the service starts is ~55 hours!!! Again, an event log entry is recorded as described above.
The avergae values were calculated from 10 differens VMs & 8 different "real" servers.