views:

49

answers:

2

Hi guys.

We have a WCF service self-hosted in a Windows Service. Right now we're experiencing a really strange behavior with it, which is that every 23 hours, approximately, the service raises an exception for every call made to it with the following error:

Server was unable to process request. ---> The request channel timed out while waiting for a reply after 00:01:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout.---> The HTTP request to 'http://servername:8016/servicio/Autorizaciones' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout.---> The operation has timed out.

The duration of the failure is between 4 and 6 minutes and, without touching anything at all, the service then goes out of the failure and the following responses are ok, for the next 23 hours, and then, again the error.

As pointed above, the WCF service is self-hosted in a Windows Service, installed on a Windows 2008 Server OS. The clients calling the service are two different Asp.NET web services, one running on the same server and the other on a virtual server in a production environment. Both clients have presented the same issue.

The configuration of the complete environment is this: Phone calls are received in an IVR system (let’s say node 1), which calls a web service (node 2) that retrieves some information about the customer calling. Once the customer approves the operation, the IVR calls the web service (node 2), which relays the call to the WCF Service (node 3) to process a credit operation. The WCF then makes a TCPIP socket operation through a VPN connection to another entity (node 4). That communication lasts between 3 and 10 seconds, is registered in a persistent database and then is sent back in the same path to the customer (node 3, 2 and 1). This platform processes about 2,000 transactions a day, 24/7, except for the ones failing with the timeout. The reason to have the transaction relayed to a second service is for security purposes. The amount of data exchanged in each call is about 200 or 300 bytes.

I’ve already tried most of the workarounds posted right here in stackoverflow ([http://stackoverflow.com/questions/981475/wcf-timeout-exception-detailed-investigation][1]) and the ones appearing there and some others found in google. The error is still persistent.

The TCPIP socket operations are logged to a text file, and found no issues there with the response times from the external entity. The largest time was 9 seconds. Also, a database operations trace has been logged and did not show any performance issues either.

The concurrency mode of the service is set to ConcurrencyMode.Multiple and, before going into production we made an stress test with ten clients making iterative calls over 2 hours, processing the wcf service about 30k transactions with no signs of performance impact. However, I already discarded a concurrency issue because the average time between transactions is one minute, and the largest one lasts for 9 seconds approximately. Besides, all other transactions complete successfully, independently of the load on the service.

I cannot increase the timeout of one minute given the fact that the service is for executing ecommerce operations and in fact there’s nothing really taking more than a few seconds to complete.

This are the facts, and I hope you guys could come up with something I haven’t already tried yet. Please have in mind at the time of answering that this is a critical mission service, and the changes or configurations possible to apply in a production environment are very limited.

Thanks in advance.

A: 

The fact that it happens every 23 hours, sounds suspiciously like an application pool recycle (but an outage of 4 - 6 minutes seems too long).

Another remote possibility is a Generation 2 Garbage Collection, but that length of outage time is also very long.

You can obviously track these using the respective built-in performance counters.

.NET CLR Memory Performance Counters: # Gen 2 Collections

WCF Performance Counters

[Are you sure there isn't some sort of periodic backup being kicked off? Do you have a virus scanner on that machine?]

Mitch Wheat
Hi Mitch. Can you please specify which performance counters are the more appropiate to follow this event? Remember the service is self-hosted in a Windows service. Thanks for your answer.
Josias
Thanks again Mitch. I forgot to mention that there are not periodic backups or any other type of scheduled task at those hours and the virus scanner is installed and running successfully.
Josias
A: 

Are the time's on your client in a different time zone? Do they have the same time as the server?

Try setting the MaxClockSkew.

JeffN825
Hi Jeff. All the servers (including db ones) are synchronized to a local NTP Server. However if you may please specify how the different time zones could produce the timeout, we would appreciate that. Thanks in advance.
Josias