tags:

views:

137

answers:

1

Hello. Take a look at this:

WebClient client = new WebClient();
WebRequestSettings wrs = new WebRequestSettings(new URL("http://stackoverflow.com/ping/?what-the-duck?"), HttpMethod.HEAD);
client.getPage(wrs);

Running this code results in throwing FileNotFoundException, because HTTP Status code on the page is 404 and getting the same page again with the GET method, with User-Agent set to Java-.... Why does it GET the page (it doesn't happen with "normal" status codes)? Is this a bug? Thanks

Here is the entire server response:

HTTP/1.1 404 Not Found
Cache-Control: private
Content-Length: 7502
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.5
Date: Thu, 11 Feb 2010 14:12:11 GMT

Where does it tell client to GET something? And how can I force WebClient to ignore it?

Here's a screenshot of HTTPDebugger: alt text The problem here is I don't understand why the second request is being sent and why is it sent with different useragent.

A: 

You execute a HEAD request - this returns a response with null content. HtmlUnit nevertheless tries to create a page. To do so, it creates an input source with url and content (which is null) and gives it to a parser. An when a parser tries to parse the input source it sees a null content and uses the URL to retrieve the content anew. So it's actually not the HtmlUnit which makes the second requests, it's the XML parser. And that's why the user agent is Java and not the HttpClient.

lexicore