ansaurus

Question

How do you send a HEAD HTTP request in Python?

Answer 1

+28 A:

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There's also a getheader(name) to get a specific header.

Eevee 2008-09-20 06:45:45

that's beautiful. :) thank you.

fuentesjr 2008-09-20 06:48:05

fluent: remember to mark the question as "accepted"

John Millikin 2008-09-20 19:00:58

Answer 2

A:

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()['content-type'], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.

2008-12-11 00:11:48

However, urllib will do a GET and the question is about performing a HEAD. Maybe the poster does not want to retrieve an expensive document.

Bluebird75 2009-05-06 08:30:32

Answer 3

+1 A:

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.

2009-04-23 01:39:05

Answer 4

+20 A:

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

doshea 2010-01-15 10:50:52

really nice answer, thx

Matt Joiner 2010-02-22 14:51:09

response.info().__str__() will return string format of the header, in case you want to do something with the result you get.

Shane 2010-10-12 12:17:18

Answer 5

A:

I have found that httplib is slightly faster than urllib2. I timed two programs - one using httplib and the other using urllib2 - sending HEAD requests to 10,000 URL's. The httplib one was faster by several minutes. httplib's total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2's total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?

PythonUser 2010-04-13 15:10:00

Input? The problem is IO-bound and you're using blocking libraries. Switch to eventlet or twisted if you want better performance. The limitations of urllib2 you mention are CPU-bound.

Devin Jeanpierre 2010-08-13 01:04:50

urllib2 follows redirects, so if some of your URLs redirect, that will probably be the reason for the difference. And, httplib is more low-level, urllib2 does parse the url for example.

Marian 2010-08-25 22:05:34

ansaurus

tags:

views:

answers:

How do you send a HEAD HTTP request in Python?

related questions