What happens to other users if the .NET worker process crashes?

A:

A new worker thread will be started and the user would not know anything happened. Unless it shuts down completely via rapid fail (http://technet.microsoft.com/en-us/library/cc779127(WS.10).aspx)

Raj Kaimal 2010-03-26 18:45:19

A:

If it's an out of memory situation, iis usually just recycles the app pool.

Pierreten 2010-03-26 18:46:15

+8 A:

W3WP.exe is the process

IIS runs all web apps in a generic worker process - w3wp.exe. Whether you write in ASP.NET, or ISAPI, or some other framework, the process that serves the web request is w3wp.exe. In the ASP.NET case, w3wp.exe loads the ASP.NET JIT-compiled DLLs and services the requests through them. In other cases, it works differently. But the key point is, w3wp.exe is the process. This model started in IIS6.0 and continues in IIS7.0.

Unexpected Failures

If the W3WP.exe fails unexpectedly, for any reason, all transactions it was handling will likely get 500 errors (Server error). IIS will start a new worker process in its place (MS calls this "Health Monitoring"), which means the web app will continue to run. Users that did not have a request being served by the failing process at the time of failure, will be unaware of any of this.

The HTTP 500 error that a client receives in this case will be indistinguishable from a 500 error that the client receives in the case of an application error, let's say an uncaught exception in your ASPNET application code.

For those requests that were in the failing process, there's no way to recover them. They will result in 500 errors at the browser. A 503 Server Busy results from IIS actively refusing the connection due to a threshold on the number of connections. A 503 does not result from an application failure, so you shouldn't expect to see 503 for in-flight transactions in the out-of-memory-and-crash scenario. On a heavily loaded system, you may see 503's as the process-crash-and-restart happens, as a secondary effect. If this is really what you're seeing, you need a larger margin of safety to handle the load in the single-error condition.

The Request Queue

IIS has a hand-off approach for requests. As they arrive on the network layer (Http.sys), they are placed in a queue, to be picked up by a worker process. Any requests waiting in the IIS queue to be handled by a WP will continue unaffected, though they might see a slight temporary increase in latency (service time) due to resource contention, since one fewer process is running on the server. Wait time in this queue is generally very very short, on a system that is configured properly.

It is when this queue is full that you will see 503 errors.

Auto restart of W3WP.exe

IIS has an auto-restart (or "nanny") facility, through which it restarts worker processes after they have exceeded configured thresholds, such as memory size, number of requests, or time-of-running. In all those cases, IIS will quiesce and restart worker processes when the configured threshold is reached. These pro-active restarts normally do not result in any disruption of requests. When IIS decides that a restart of a worker process is necessary, it prevents any new requests from arriving at that to-be-quiesced WP. Existing requests are drained: any in-flight transactions in that WP are allowed to complete normally. When all requests in the WP complete, then the WP dies and IIS starts a new one in its place. This new process then immediately begins picking up new requests from the dispatch queue. This is all transparent to users or browsers.

I say normally because it's possible that the worker process has become truly sick at the same time as the threshold has been reached. In that case the w3wp.exe may not respond to IIS within the configured "quiesce" timeout, and thus IIS has to eventually kill the process even though it hasn't reported that all of its in-flight requests have completed. This should be exceedingly rare, because it's two distinct exceptional conditions, but it happens. In this case, the in-flight requests will once again, get 500 errors.

Web gardens

Also - IIS allows multiple worker processes on a single server. MS calls this a "web garden", a play on words from "web farm". If you have a web garden set up, then transactions being served by w3wp.exe instances other than the failing one, will continue unaffected. "Unaffected" presumes though, that the out-of-memory error is localized, and not a system-wide problem.

Bottom Line

The bottom line is that there is no substitute for your own testing. The configuration options are pretty broad - from restart thresholds to web gardens and so on. Also the failure modes tend to be pretty complex and varied, whether it's memory, timeout, too busy, and so on. You'll want to understand what to expect.

ps: this Q&A really belongs on serverfault.com !!

references:
http://blogs.iis.net/thomad/archive/2008/05/07/the-iis-process-model-features.aspx

Cheeso 2010-03-26 19:01:53

Thank you for the thorough response. Re: Serverfault, what is the remedy for that? Cross post and reference the correct response? Or can an admin move?

Jason Slocomb 2010-03-26 23:16:07

I don't know how it works but I think it needs to be voted closed, and then migrated.

Cheeso 2010-03-28 21:47:20

I will close it then since this is far and away the most popular answer. Thanks Cheeso.

Jason Slocomb 2010-04-14 21:08:54

A:

As the other answers say, in most cases everything just restarts, and most users who did not have a pending request at the time will not notice much more than a delay.

However, if your application uses session variables with In-Proc session state, all session variables for all users will be lost when the app pool restarts. This may or may not have a negative effect on the users, depending on what you're doing with the session variables. You can avoid this by switching to StateServer or SQL Server session storage.

Joel Mueller 2010-03-26 19:01:58

ansaurus

tags:

views:

answers: