tags:

views:

438

answers:

3

My application has 50 service endpoints (such as /mysite/myService.svc). It's hosted in IIS. Intermittently (once every two or three days) a service stops responding. It's never the same service that hangs. While a service is hung, some of the other services work fine and some other are also hung.

All clients (from different computers) get this error:

ServiceModel.CommunicationException 

Message: An error occurred while receiving the HTTP response to 
https://server/mysite/myservice1.svc.

This could be due to the service endpoint binding not using the HTTP 
protocol. This could also be due to an HTTP request context being
aborted by the server (possibly due to the service shutting down). 
See server logs for more details.

No exceptions are raised by the server when the client attempts to call the service that is hung. All I have is that error on the client side.

I have to manually recycle the application pool to fix the problem.

Do you know what could be the cause? How can I investigate this issue? I'm willing to take a memory dump of the worker process when a service is hung but I would not know what to search for in the dump.

Update (Aug 13 2009): I have almost ruled out the idea that the server runs out of connections (see comment in Shiraz Bhaiji's answer). I might have a new lead: I log all server-side exceptions in a log file. So in theory, when this occurs on the client, no exceptions are raised on the server; otherwise I'd have proof of that in my logs. But what if an error does occur on the server but is happening at a low level where exceptions are not routed to my exception handling code? I have posted this question about scenarios where low level exceptions cannot be handled. I'll keep you informed of the progress of my investigation.

A: 

I have not come across this particular issue but would suggest to turn on tracing/message logging for the WCF service in the config for the service and/or the client app (if you have control over that). I've done this in the last few days for a service that I needed to troubleshoot.

The MSDN link here is a good starting point.

Also see the table in this post for the varying levels of trace detail you can configure. There are several levels which can go from exception only logging to full message details. It is quite quick to set this up in the app.config file.

To parse the log file output use the SvcTraceViewer.exe that comes with the Windows SDK, which if you have it installed should be located in this folder: C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin

Henryk
A: 

Sounds like you are running out of connections.

By default WCF has a timeout and therefore holds a connection open for 10 mins.

When you recycle the app pool all connections are closed, and therefore things work again.

To fix it check your code to make sure that you close connections / dispose of proxies.

Shiraz Bhaiji
The clients do close the connection (we control the client app) but that is a good point nonetheless. If the client app crashes (or is killed) for instance, it might not have a chance to close the proxy. But in such a case wouldn't the server throw an error if it has reached the max nb of connections?
Sly
I will try and lower the receiveTimeout on the server. If I start getting timeouts, that will indicate a problem with the way the client is closing the proxy.
Sly
I have lowered the receiveTimeout. That did not change anything. I'm about to rule out the idea that I run out of connections. I did a test in a controlled environment: When a server runs out of connections, the errors that the clients get are a TimoutException, not a CommunicationException such as what I get in production. So I think it's something else.
Sly
What are your settings for recycling of the application pool? Could you for example "solve" the problem by recycling the application pool every night?
Shiraz Bhaiji
First: thank you for taking the time to help me, I really appreciate it. Second: All recycling settings are disabled. We never enable automatic recycling on our servers because that hides bugs. Are you suggesting that as a solution? Or are you suggesting that I enable recycling every night for a few days to see if the problem will disappear? I'd be willing to do the later. But let's suppose it does make the problem disappear, how will that help me understand the root cause of the problem? What would you do next?
Sly
Try it for a few days. If the problem goes away you atleast know that it is something that builds up over time, could be a leak of memory, connections, threads ..... Then you could do a code review to check that things are being closed and disposed properly. Good luck.
Shiraz Bhaiji
A: 

To resolve this, we set establishSecurityContext to False on the binding.

Sly