views:

1068

answers:

3

Does urllib2 in Python 2.6.1 support proxy via https?

I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:

NOTE

Currently urllib2 does not support fetching of https locations through a proxy. This can be a problem.

I'm trying automate login in to web site and downloading document, I have valid username/password.

proxy_info = {
    'host':"axxx", # commented out the real data
    'port':"1234"  # commented out the real data
}

proxy_handler = urllib2.ProxyHandler(
                 {"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
         urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)

fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)

I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.

I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?

EDIT: Nope, I tested with

   proxies = {
        "http" : "http://%(host)s:%(port)s" % proxy_info,
        "https" : "https://%(host)s:%(port)s" % proxy_info
    }

    proxy_handler = urllib2.ProxyHandler(proxies)

And I get error:

urllib2.URLError: urlopen error [Errno 8] _ssl.c:480: EOF occurred in violation of protocol

+3  A: 

I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)

Alex Martelli
Aaah, got it :) let me try (btw I've figured out what I needed with curl but still would be nice to have it running in python)
stefanB
Unfortunately adding the proxy as the 'https' key in the dict you pass to ProxyHandler won't fix the problem as AFAIK there's no support for the CONNECT HTTP method. Using PyCurl is the easiest workaround, but for distributing code the lack of Windows support in PyCurl (or at least ease of installation) can be a big hurdle.
Tom
+1  A: 

Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).

See the unittest in the python source that is testing https proxying support for further information: http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203

Kit Sunde
+1 thanks for the info
stefanB
+1  A: 
Gringo Suave
+1 thanks for the info
stefanB