views:

120

answers:

1

Hi,

I'm having little trouble creating a script working with URLs. I'm using urllib.urlopen() to get content of desired URL. But some of these URLs requires authentication. And urlopen prompts me to type in my username and then password. What I need is to ignore every URL that'll require authentication, just easily skip it and continue, is there a way to do this? I was wondering about catching HTTPError exception, but in fact, exception is handled by urlopen() method, so it's not working.

Thanks for every reply.

+1  A: 

You are right about the urllib2.HTTPError exception:

exception urllib2.HTTPError

Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.

code

An HTTP status code as defined in RFC 2616. This numeric value corresponds to a value found in the dictionary of codes as found in BaseHTTPServer.BaseHTTPRequestHandler.responses.

The code attribute of the exception can be used to verify that authentication is required - code 401.

>>> try: 
...     conn = urllib2.urlopen('http://www.example.com/admin')
...     # read conn and process data
... except urllib2.HTTPError, x:
...     print 'Ignoring', x.code
...     
Ignoring 401
>>> 
gimel
thanks a lot :)
j3nc3k