views:

240

answers:

1

I need to test around 300 URLs to verify if they lead to actual pages or redirect to some other page. I wrote a simple application in .NET 2.0 to check it, using HttpWebRequest. Here's the code snippet:

System.Net.HttpWebRequest wr = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create( url );
System.Net.HttpWebResponse resp = (System.Net.HttpWebResponse)wr.GetResponse();
code = resp.StatusDescription;

Code ran fast and wrote to file that all my urls return status 200 OK. Then I realized that by default GetResponse() follows redirects. Silly me! So I added one line to make it work properly:

System.Net.HttpWebRequest wr = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create( url );
wr.AllowAutoRedirect = false;
System.Net.HttpWebResponse resp = (System.Net.HttpWebResponse)wr.GetResponse();
code = resp.StatusDescription;

I ran the program again and waited... waited... waited... It turned out that for each url I was getting a System.Net.WebException "The operation has timed out". Surprised, I checked the URL manually - works fine... I commented out AllowAutoRedirect = false line - and it works fine again. Uncommented this line - timeout. Any ideas what might cause this problem and how to work around?

+3  A: 

Often timeouts are due to web responses not being disposed. You should have a using statement for your HttpWebResponse:

using (HttpWebResponse resp = (HttpWebResponse)wr.GetResponse())
{
    code = resp.StatusDescription;
    // ...
}

We'd need to do more analysis to predict whether that's definitely the problem... or you could just try it :)

The reason is that .NET has a connection pool, and if you don't close the response, the connection isn't returned to the pool (at least until the GC finalizes the response). That leads to a hang while the request is waiting for a connection.

Jon Skeet
No idea why it worked ok without AllowAutoRedirect=false or how did it know on first request that I won't close it properly, but it helped!
@perpetka: When it was redirecting, it was probably closing the original connection and had enough *different* hosts to redirect to that it avoided the problem.
Jon Skeet
Makes sense. Thanks very much for your help!