views:

50

answers:

2

I'm trying to do HEAD requests to follow 302 links, however this link: http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGrJk-F7Dmshmtze2yhifxRsv8sRg&url=http://www.mtv.com/news/articles/1647243/20100907/story.jhtml

is troublesome because a HEAD request returns a 200 OK and a GET request returns the expected 302 Status code.

So I'll need to do a GET request but I'd rather not have to pay for the extra bandwidth times that will come from getting the entire HTML document. Anyone know a hack to do a GET without getting the body returned?

UPDATE: took David's advice to do a Range header but they seem to still be ignoring it

GET /news/url?sa=t&fd=R&usg=AFQjCNGrJk-F7Dmshmtze2yhifxRsv8sRg&url=http://www.mtv.com/news/articles/1647243/20100907/story.jhtml HTTP/1.1
Range: bytes=0-10
x-ms-range: 0-600
Host: news.google.com
Connection: Keep-Alive
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

HTTP/1.1 302 Moved Temporarily
Content-Type: text/html; charset=UTF-8
Location: http://www.mtv.com/news/articles/1647243/20100907/story.jhtml
Content-Length: 258
Date: Wed, 08 Sep 2010 20:28:16 GMT
Expires: Wed, 08 Sep 2010 20:28:16 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Set-Cookie: PREF=ID=ef5f1bc768645c5e:TM=1283977696:LM=1283977696:S=5n26IrEDpcQTJIb1; expires=Fri, 07-Sep-2012 20:28:16 GMT; path=/; domain=.google.com

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.mtv.com/news/articles/1647243/20100907/story.jhtml"&gt;here&lt;/A&gt;.
</BODY></HTML>
A: 

1) File a bug with the web server's owner.

2) Try using the Range header in your request.

3) If that doesn't work, can you just hang up the connection after you get the headers you want?

David M.
thanks David, didn't know about the Range header.. I'm now sending that but they're still ignoring it unless I didn't specific it correctly. I've updated my post with the results.
I guess it's just a cheapy web service. Try hanging up the socket? You could also just use the Location header regardless of the HTTP status.
David M.
on the head request they don't send back a location header, unfortunately.
A: 

In the specific example you cite, you could just pull it out of the original URL's "url" parameter. But for a more generic approach, I'd stick to David M.'s suggestions

Marc Novakowski
that only works for that one example, I'm dealing with millions of random urls currently