views:

386

answers:

2

I am calling 5 external servers to retrieve XML-based data for each request for a particular webpage on my IIS 6 server. Present volume is between 3-5 incoming requests per second, meaning 15-20 outgoing requests per second.

99% of the outgoing requests from my server (the client) to the external servers (the server) work OK but about 100-200 per day end up with a "The operation has timed out" exception.

This suggests I have a resource problem on my server - some shortage of sockets, ports etc or a thread lock but the problem with this theory is that the failures are entirely random - there are not a number of requests in a row that all fail - and two of the external servers account for the majority of the failures.

My question is how can I further diagnose these exceptions to determine if the problem is on my end (the client) or on the other end (the servers)?

The volume of requests precludes putting an analyzer on the wire - it would be very difficult to capture these few exceptions. I have reset CONNECTIONS and THREADS in my machine.config and the basic code looks like:

Dim hRequest As HttpWebRequest
Dim responseTime As String
Dim objWatch As New Stopwatch

Try

  ' calculate time it takes to process transaction
  objWatch.Start()

  hRequest = System.Net.WebRequest.Create(url)
  ' set some defaults
  hRequest.Timeout = 5000
  hRequest.ReadWriteTimeout = 10000
  hRequest.KeepAlive = False ' to prevent open HTTP connection leak
  hRequest.SendChunked = False
  hRequest.AllowAutoRedirect = True
  hRequest.MaximumAutomaticRedirections = 3
  hRequest.Accept = "text/xml"
  hRequest.Proxy = Nothing 'do not waste time searching for a proxy 
  hRequest.ServicePoint.Expect100Continue = False

  Dim feed As New XDocument()
  ' use *Using* to auto close connections
  Using hResponse As HttpWebResponse = DirectCast(hRequest.GetResponse(), HttpWebResponse)
    Using reader As XmlReader = XmlReader.Create(hResponse.GetResponseStream())
      feed = XDocument.Load(reader)
      reader.Close()
    End Using
    hResponse.Close()
  End Using

  objWatch.Stop()
  ' Work here with returned contents in "feed" document
  Return XXX' some results here

Catch ex As Exception

  objWatch.Stop()
  hRequest.Abort()
  Return Nothing

End Try

Any suggestions?

A: 

By default, HttpWebRequest limits you to 2 connections per HTTP/1.1 server. So, if your requests take time to complete, and you have incoming requests queuing up on the server, you will run out of connection and thus get timeouts.

You should change the max outgoing connections on ServicePointManager.

ServicePointManager.DefaultConnectionLimit = 20 // or some big value.
feroze
Long beforehand, I changed both the *MAX CONNECTIONS* in machine.config to 200 and set *ServicePointManager.DefaultConnectionLimit = 200* just to test. In addition, on a timeout exception I tested ServicePointManager.DefaultConnectionLimit to make sure it had the high value. I also set the maximum number of ports: [HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]MaxUserPort = 5000 (Default = 5000, Max = 65534)See: [this][1] [1]: http://smallvoid.com/article/winnt-tcpip-max-limit.html
dalej
If it is a resource scarcity issue, it must be one of connection, port or thread. Could there be anything else? I should have enough connections and ports given the my settings and Perfmon shows I have at any one time 12K in threads but *TCPIP/Connection Failures* increases by one every 5-10 seconds. Of course, it could just be the other end(s) but how is one to diagnose this. Better even yet, how does one maintain performance when many timeouts are occurring? I am really mystified by this.
dalej
Can you give the exact exception stacktrace?
feroze
The error is simply:Exception is: The operation has timed out ResponseTime: 3001 millisecs and Url is: blah, blahThe timeout of 3001 was a result of resetting the timeout to 3 secs.
dalej
A: 

You said that you are doing 5 outgoing request for each incoming request to the ASP page. Is that 5 different servers, or the same server?

DO you wait for the previous request to complete, before issuing the next one? Is the timeout happening while it is waiting for a connection, or during the request/response?

If the timeout is happening during the request/response then it means that the target server is under stress. The only way to find out if this is the case, is to run wireshark/netmon on one of the machines, and look at the network trace to see if the request from the app is even making it through to the server, and if it is, whether the target server is responding within the given timeout.

If this is a thread starvation issue, then one of the ways to diagnose it is to attach windbg.exe debugger to w3wp.exe process, when you start getting timeout. Then load the sos.dll debugging extension. And run the !threads command, followed by !threadpool command. It will show you how many Worker threads and completion port threads are utilized/remaining. If the #completionport threads or worker threads are low, then that will contribute to the timeout.

Alternatively, you can monitor ASP.NET and System.net perf counters. See if the ASP.NET request queue is increasing monotonically - this might indicate that your outgoing requests are not completing fast enough.

Sorry, there are no easy answers here. THere is a lot of avenues you will need to explore. If I were you, I would start off by attaching windbg.exe to w3wp when you start getting timeouts and do what I described earlier.

feroze
The outgoing requests (and incoming for that matter) are all on one server with one IP address (Ethernet 10meg). Attempts are now made synchronously only because I wanted to ensure the code was working before converting to asynchronous.The asp.net -> requests queued counter is rarely non-zero so that does not lead anywhere. The volume of requests really limits the use of a debugger or wireshark tool. Finding one of 100 daily timeouts in an average day of over 1M requests, is tough. Many thanks for your comments.
dalej
It's not that difficult - just attach windbg.exe to the W3WP process when you start getting timeout exceptions, dump the process (.dump /ma <filename) and detach debugger. Then you can restart debugger, load the previously created dump, and debug away...
feroze