views:

527

answers:

2

I need to access a webpage using

twisted.web.client.getPage()

or a similar method to download a webpage from a known address (ie:www.google.com), the problem is: I am behind a proxy server and I couldn't find anywhere explanations on how to configure twisted or factories to use my proxy, any ideas?

Bear in mind I have to specify user, password, host and port. On my linux machine I setup http_proxy and https_proxy to http://user:pwd@ip:port

Thankyou in advance.

+2  A: 
from twisted.internet import reactor
from twisted.web import client

def processResult(page):
    print "I got some data", repr(page)
    reactor.callLater(0.1, reactor.stop)
def dealWithError(err):
    print err.getErrorMessage()
    reactor.callLater(0.1, reactor.stop)

class ProxyClientFactory(client.HTTPClientFactory):
    def setURL(self, url):
        client.HTTPClientFactory.setURL(self, url)
        self.path = url

factory = ProxyClientFactory('http://url_you_want')
factory.deferred.addCallbacks(processResult, dealWithError)

reactor.connectTCP('proxy_address', 3142, factory)
reactor.run()
nosklo
This solution doesn't work, although I understand the code is correct, my DNS doesn't recognize the hostname with proxy_address = `http://user:pwd@host:port`
Lex
Yeah, well, authentication is a different beast
nosklo
Accepted answer, this one answers my question, I just needed to do a different question :D
Lex
+1  A: 

To get nosklo's solution to work you will need to create another handler for the '401' that indicates that authentication is required. Try something like this

def checkAuthError(self,failure,url):
    failure.trap(error.Error)
    if failure.value.status == '401':
        username = raw_input("User name: ")
        password = getpass.getpass("Password: ")
        auth = base64.encodestring("%s:%s" %(username, password))
        header = "Basic " + auth.strip()
        return client.getPage(
            url, headers={"Authorization": header})
    else:
        return failure

This will prompt the operator to supply the information at the command line, or you couple could supply the username and password in another way of your choosing. Make sure this is the first handler added as an Errback, before any other handlers are added even the Callback. This also requires a few more imports; 'base64', 'getpass', and 'error' to work with the command line prompts.

lewisblackfan
Except, don't use `base64.encodestring()` because it may insert newlines which is Bad. Instead, use `base64.b64encode()`.
tdavis