



Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual progress bar. This is Python >=2.6.

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

However, urlretrieve is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).

>>> fn, h = urllib.urlretrieve('')
>>> h.items() 
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
 ('expires', '-1'),
 ('content-type', 'text/html; charset=ISO-8859-1'),
 ('server', 'gws'),
 ('cache-control', 'private, max-age=0')]
>>> h.status

What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?

+4  A: 

Check out urllib.urlretrieve's complete code:

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default to detect 404s:

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)
I don't want to specify handlers; does it throw exceptions like urllib2.urlopen?
Sridhar Ratnakumar
It's very easy to make it throw. FancyURLopener subclasses URLopener which does throw, so you can try calling the base class's implementation: def http_error_default(...): URLopener.http_error_default(...)
This is a very good solution, I used it myself just now.
Christian Davén

The URL Opener object's "retreive" method supports the reporthook and throws an exception on 404.

Yes, but it doesn't support redirects, etc..
Sridhar Ratnakumar
+1  A: 

You should use:

import urllib2

    resp = urllib2.urlopen("")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80

Edit: The rationale here is that unless you expect the exceptional state, it is an exception for it to happen, and you probably didn't even think about it -- so instead of letting your code continue to run while it was unsuccessful, the default behavior is--quite sensibly--to inhibit its execution.

hook-like support?
Sridhar Ratnakumar