tags:

views:

408

answers:

3

I'm looking for a quick way to get an http response code from a url (i.e. 200, 404, etc). Not sure which library to use.

Thanks

+4  A: 

You should use urllib2, like this:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]
RichieHindle
+10  A: 

Here's a solution that uses httplib instead.

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404
Evan Fosmark
+1 for HEAD request — no need to retrieve the entire entity for a status check.
Ben Blank
Although you really should restrict that `except` block to at least `StandardError` so that you don't incorrectly catch things like `KeyboardInterrupt`.
Ben Blank
Good idea, Ben. I updated it accordingly.
Evan Fosmark
A: 

urllib2.HTTPError does not contain getcode() method. Use code attribute instead.

Husio