ansaurus

Question

What errors/exceptions do I need to handle with urllib2.Request / urlopen?

Answer 1

+1 A:

You can catch all exceptions and log what's get caught:

 import sys
 import traceback
 def formatExceptionInfo(maxTBlevel=5):
     cla, exc, trbk = sys.exc_info()
     excName = cla.__name__
     try:
         excArgs = exc.__dict__["args"]
     except KeyError:
         excArgs = "<no args>"
     excTb = traceback.format_tb(trbk, maxTBlevel)
     return (excName, excArgs, excTb)
 try:
     x = x + 1
 except:
     print formatExceptionInfo()

(Code from http://www.linuxjournal.com/article/5821)

Also read documentation on sys.exc_info.

Eugene Morozov 2009-03-20 13:00:38

Better to use "except Exception:" so you don't catch errors that are going to cause problems in your except handler.

S.Lott 2009-03-20 13:09:35

Better to not catch exceptions at all if you're just logging them -- see my answer.

Steven Huwig 2009-03-20 13:15:25

@S.Lott: thanks, it's a trick I wasn't familiar with

Eugene Morozov 2009-03-20 13:21:17

@Steven Huwig: yes, but I find that even using excepthook is clumsy - I'd better added some logging on the server, for example, somescript.py 2>/var/tmp/scrape.log

Eugene Morozov 2009-03-20 13:22:21

Answer 2

+4 A:

From the docs page urlopen entry, it looks like you just need to catch URLError. If you really want to hedge your bets against problems within the urllib code, you can also catch Exception as a fall-back. Do not just except:, since that will catch SystemExit and KeyboardInterrupt also.

Edit: What I mean to say is, you're catching the errors it's supposed to throw. If it's throwing something else, it's probably due to urllib code not catching something that it should have caught and wrapped in a URLError. Even the stdlib tends to miss simple things like AttributeError. Catching Exception as a fall-back (and logging what it caught) will help you figure out what's happening, without trapping SystemExit and KeyboardInterrupt.

DNS 2009-03-20 13:06:01

+1 for the link to the docs, and the clean advice on exceptions

Jarret Hardie 2009-03-20 13:08:52

+1 for letting me know what's the difference between except: and except Exception (at least one of them).

hyperboreean 2009-03-20 13:11:08

Does except urllib2.URLError, e: not already catch URLError?

DavidM 2009-03-20 13:11:15

Answer 3

+3 A:

$ grep "raise" /usr/lib64/python/urllib2.py
IOError); for HTTP errors, raises an HTTPError, which can also be
        raise AttributeError, attr
                raise ValueError, "unknown url type: %s" % self.__original
        # XXX raise an exception if no one else should try to handle
        raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
        perform the redirect.  Otherwise, raise HTTPError if no-one
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)
                raise HTTPError(req.get_full_url(), code,
            raise HTTPError(req.get_full_url(), 401, "digest auth failed",
                raise ValueError("AbstractDigestAuthHandler doesn't know "
            raise URLError('no host given')
            raise URLError('no host given')
            raise URLError(err)
        raise URLError('unknown url type: %s' % type)
        raise URLError('file not on local host')
            raise IOError, ('ftp error', 'no host given')
            raise URLError(msg)
            raise IOError, ('ftp error', msg), sys.exc_info()[2]
            raise GopherError('no host given')

There is also the possibility of exceptions in urllib2 dependencies, or of exceptions caused by genuine bugs.

You are best off logging all uncaught exceptions in a file via a custom sys.excepthook. The key rule of thumb here is to never catch exceptions you aren't planning to correct, and logging is not a correction. So don't catch them just to log them.

Steven Huwig 2009-03-20 13:11:27

Answer 4

+1 A:

Add generic exception handler:

request = urllib2.Request('http://www.example.com', postBackData, { 'User-Agent' : 'My User Agent' })

try: 
    response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
    checksLogger.error('HTTPError = ' + str(e.code))
except urllib2.URLError, e:
    checksLogger.error('URLError = ' + str(e.reason))
except httplib.HTTPException, e:
    checksLogger.error('HTTPException')
except Exception:
    import traceback
    checksLogger.error('generic exception: ' + traceback.format_exc())

vartec 2009-03-20 13:12:45

Answer 5

A:

I catch:

httplib.HTTPException
urllib2.HTTPError
urllib2.URLError

I believe this covers everything including socket errors.

Corey Goldberg 2009-03-20 14:30:09

ansaurus

tags:

views:

answers:

What errors/exceptions do I need to handle with urllib2.Request / urlopen?

related questions