views:

216

answers:

1

I have a very simple problem and I am absolutely amazed that I haven't seen anything on this specifically. I am attempting to follow best practices for copying a file that is hosted on a webserver going through a proxy server (which does not require auth) using python3.

i have done similar things using python 2.5 but I am really coming up short here. I am trying to make this into a function that i can reuse for future scripts on this network. any assistance that can be provided would be greatly appreciated.

I have the feeling that my issue lies within attempting to use urllib.request or http.client without any clear doc on how to incorporate the use of a proxy (without auth).

I've been looking here and pulling out my hair... http://docs.python.org/3.1/library/urllib.request.html#urllib.request.ProxyHandler http://docs.python.org/3.1/library/http.client.html http://diveintopython3.org/http-web-services.html

even this stackoverflow article: http://stackoverflow.com/questions/1450132/proxy-with-urllib2

but in python3 urllib2 is deprecated...

A: 

here is an function to retrieve a file through an http proxy:

import urllib.request

def retrieve( url, filename ):
    proxy = urllib.request.ProxyHandler( {'http': '127.0.0.1'} )
    opener = urllib.request.build_opener( proxy )
    remote = opener.open( url )
    local = open( filename, 'wb' )
    data = remote.read(100)
    while data:
        local.write(data)
        data = remote.read(100)
    local.close()
    remote.close()

(error handling is left as an exercise to the reader...)

you can eventually save the opener object for later use, in case you need to retrieve multiple files. the content is written as-is into the file, but it may need to be decoded if a fancy encoding has been used.

Adrien Plisson