ansaurus

Question

Python: get http headers from urllib call?

Answer 1

+6 A:

What about sending a HEAD request instead of a normal GET request. The following snipped (copied from a similar question) does exactly that.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

reto 2009-05-09 14:17:34

i think that would do it!

shigeta 2009-05-09 14:51:19

Answer 2

+4 A:

urllib2.urlopen does an HTTP GET (or POST if you supply a data argument), not an HTTP HEAD (if it did the latter, you couldn't do readlines or other accesses to the page body, of course).

Alex Martelli 2009-05-09 14:18:33

oh thanks - that answers the question

shigeta 2009-05-09 14:52:07

Answer 3

+2 A:

An example found in python docs

Quoting from the site:

Here is an example session that uses the GET method:

>>> import httplib
>>> conn = httplib.HTTPConnection("www.python.org")
>>> conn.request("GET", "/index.html")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
200 OK
>>> data1 = r1.read()
>>> conn.request("GET", "/parrot.spam")
>>> r2 = conn.getresponse()
>>> print r2.status, r2.reason
404 Not Found
>>> data2 = r2.read()
>>> conn.close()

Here is an example session that shows how to POST requests:

>>> import httplib, urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> headers = {"Content-type": "application/x-www-form-urlencoded","Accept":
>>> "text/plain"}
>>> conn = httplib.HTTPConnection("musi-cal.mojam.com:80")
>>> conn.request("POST", "/cgi-bin/query", params, headers)
>>> response = conn.getresponse()
>>> print response.status, response.reason
>>> 200 OK
>>> data = response.read()
>>> conn.close()

dassouki 2009-05-09 15:10:54

Answer 4

+3 A:

Use the response.info() method to get the headers.

From the urllib2 docs:

urllib2.urlopen(url[, data][, timeout])

...

This function returns a file-like object with two additional methods:

geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed

info() — return the meta-information of the page, such as headers, in the form of an httplib.HTTPMessage instance (see Quick Reference to HTTP Headers)

So, for your example, try stepping through the result of response.info().headers for what you're looking for.

Note the major caveat to using httplib.HTTPMessage is documented in python issue4773.

tolmeda 2009-10-29 00:17:31

ansaurus

tags:

views:

answers:

Python: get http headers from urllib call?

related questions