views:

108

answers:

6

I have an application that spiders websites for information. It seems like after 20-45 minutes of creating HttpWebRequests a bunch of them return timeouts. One thing we do is attach a BindIPDelegate anonymous function to give the request a specific IP since we round-robin through about 150 IPs.

I'm setting up the HttpWebRequest object with the following settings..

  • Setting User-Agent
  • Setting Keep-Alive to false so that the IP isn't re-used
  • Setting TimeOut to 60000 (60 seconds)
  • Setting ReadWriteTimeout to 60000 (60 seconds)
  • Setting Proxy to null
  • Setting Accept to /
  • Setting CookieContainer to new CookieContainer
  • Setting Piplined to true
  • Setting Automatic Decompression to Deflate & GZIP

The application is using .NET 4.0 and running on Windows Server 2008 R2.

This definitely seems like something application/TCP/.NET related because if I restart the application it runs fine again. Also it appears more or less like the ones timing out are just queued up waiting on a local port or something.

Any ideas?

A: 

Could it be an IDS on the remote end thinking you're an attacker and blocking you?

Yuliy
Wouldn't it continue to block me though? It seems like it happens for a short interval then goes away.
Chad Moran
Depends on the setup - you can configure how long you want to block a perceived attack
arootbeer
Are you able to run any diagnostics on your server - for example, TCPView see what is breaking the connection etc?
Monkieboy
+1  A: 

I would guess it is due to ThreadPool related issues.

Sudesh Sawant
A: 

My guess is that maybe not all objects are being disposed correctly, and some TCP ports are being kept open. Try to see what objects implement IDisposable. At least the result from GetResponse and GetResponseStream are IDisposables and should be correctly disposed.

Pieter
Everything that is IDisposable is wrapped in a using statement.
Chad Moran
+3  A: 

You don't say much about the code you actually use to perform the requests but, anyway, here are my guesses:

  1. You are using BeginGetResponse()/EndGetResponse() with a callback and the callback takes too long to complete (or blocks!). This could cause a deadlock in the threadpool if you are issuing a lot of requests in a short period of time.

  2. Since you are not reusing the connections and, again, if the requests happen very fast and non-stop, you might run out of sockets (last time I tried, ~3k per interface on windows). If setting KeepAlive to true fixes your problem, this is it.

  3. You are not calling Dispose()/Close() on the HttpWebRequest or the HttpWebResponse or the Stream you get from the response. This might work for a little bit until you hit the limit of 2 (from the MSDN docs) or 6 (configuration file default) in your application configuration settings for (system.net/connectionManagement/add[address="*",maxconnection="6"]). A simple way to test if this is the problem is to set the limit to 1 and see if the problem happens earlier than before.

Btw, setting KeepAlive to false and Pipelined to true does not make sense.

Gonzalo
The easy way to diagnose port exhaustion would be with `netstat`.
Steven Sudit
@Steven: yeah, in linux I would do something like "netstat -nt". If there are a lot of CLOSE_WAIT, it would be case 3 above. If there are a lot of TIME_WAIT, it would be case 2 above. Increasing 'ulimit -n' would help case 2, but case 3 is an application problem.
Gonzalo
Windows has a ported version of that Unix command line tool built in. However, while there's a pair of registry entries that can be used to adjust the port limits, the right answer is still to fix the code so that it reuses ports.
Steven Sudit
By way of explanation, Pipelined means multiple requests are sent on the same connection without waiting for the first response. Since KeepAlive means it shouldn't use reuse the connection, it invalidates Pipelined.
Steven Sudit
A: 

easier to show an example of what I meant in the comments, not my own work, but the guys at microsoft do such a sweet job that I pass you the link.

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.begingetrequeststream.aspx

If your doing heavy input output via http, I would always suggest in looking at callback mechanisms.

Also make sure you close those damn httpWebRequest objects. Wrap everything up in bubble wrap plastic by using the "using" statements liberally.

multi-threaded operations: There is a default setting of 2 connections per host connection.
That setting can be changed. If the maximum number of connections are in use, then the HttpWebRequest operations (request/response) will be queued until a connection slot is available.

an article I came across refering to webservices might also affect your problem, as the causes are very similar, heres a link:

http://support.microsoft.com/kb/821268

WeNeedAnswers
A: 

Try adding the following to your app.config, beneath the configuration-tag. I think this solved a similar problem I had when doing a lot of http-connections repeatedly:

  <system.net>
    <defaultProxy enabled="false">
    </defaultProxy>
    <connectionManagement>
      <remove address="*"/>
      <add address="*" maxconnection="1000" />
    </connectionManagement>
  </system.net>

Edit: I think the defaultProxy-tag was the really, really crucial tag.

Onkelborg