tags:

views:

20

answers:

1

I'm writing a dead link detector and wondering which lib i should use, httplib and urllib, so I tried both.

def http_response_code(url):
    host = urlparse(url)[1]
    req = '/'.join(urlparse(url)[2:5])
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', req)
    res = conn.getresponse()
    return res.status, res.reason

def urllib_response_code(url):
    a = urllib.urlopen(url)
    return a.getcode()


def main():
    url = 'http://1010wins.com/content_page.php?contenttype=4&contentid=6077355'
    print http_response_code(url)
    print urllib_response_code(url)

But I'm confused about the mismatch of results returned by the above 2 methods, which are:

(302, 'Found')
200

I think maybe it's because urllib automatically detect the redirect and fetch the final destination page, and then return the response code?

Thanks.

+3  A: 

302 is the HTTP status code for a redirect (see for example here), and httplib (the lower-level library) returns it faithfully, while urllib is automatically following the redirect and giving you the final resulting status code (200 for "everything OK").

Pick the library that best suits the abstraction layer you want to work at -- httplib gives you a lot more control, but it's less general (won't do anything with URLs with other protocols such as ftp:, etc, for example) and lower-level (so you have to do a bit more work!-).

Alex Martelli