I'm trying to add authenticating proxy support to an existing script, as it is the script connects to a https url (with urllib2.Request and urllib2.urlopen), scrapes the page and performs some actions based on what it has found. Initially I had hoped this would be as easy as simply adding a urllib2.ProxyHandler({"http": MY_PROXY}) as an arg to urllib2.build_opener which in turn is passed to urllib2.install_opener. Unfortunately this doesn't seem to work when attempting to do a urllib2.Request(ANY_HTTPS_PAGE). Googling around lends me to believe that the proxy support in urllib2 in python 2.5 does not support https urls. This surprised me to say the least.
There appear to be solutions floating around the web, for example http://bugs.python.org/issue1424152 contains a patch for urllib2
and httplib
which purports to solve the issue (when I tried it the issue I began to get the following error instead: urllib2.URLError: <urlopen error (1, 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')>
). There is a cookbook recipe here http://code.activestate.com/recipes/456195 which I am planning to try next. All in all though I'm surprised this isn't supported "out of the box", which makes me wonder if I'm simply missing out on an obvious solutions, so in short — has anyone got a simple method for fetching https pages using an authenticating proxy with urllib2 in Python 2.5? Ideally this would work:
import urllib2
#perhaps the dictionary below needs a corresponding "https" entry?
#That doesn't seem to work out of the box.
proxy_handler = urllib2.ProxyHandler({"http": "http://user:pass@myproxy:port"})
urllib2.install_opener( urllib2.build_opener( urllib2.HTTPHandler,
urllib2.HTTPSHandler,
proxy_handler ))
request = urllib2.Request(A_HTTPS_URL)
response = urllib2.urlopen( request)
print response.read()
Many Thanks