views:

113

answers:

2

Hey, Recently I have been playing around with the HTTP Proxy in twisted. After much trial and error I think I finally I have something working. What I want to know though, is how, if it is possible, do I expand this proxy to also be able to handle HTTPS pages? Here is what I've got so far:

from twisted.internet import reactor
from twisted.web import http
from twisted.web.proxy import Proxy, ProxyRequest, ProxyClientFactory, ProxyClient



class HTTPProxyClient(ProxyClient):
    def handleHeader(self, key, value):
        print "%s : %s" % (key, value)
        ProxyClient.handleHeader(self, key, value)

    def handleResponsePart(self, buffer):
        print buffer
        ProxyClient.handleResponsePart(self, buffer)

class HTTPProxyFactory(ProxyClientFactory):
    protocol = HTTPProxyClient

class HTTPProxyRequest(ProxyRequest):
    protocols = {'http' : HTTPProxyFactory}

    def process(self):
        print self.method
        for k,v in self.requestHeaders.getAllRawHeaders():
            print "%s : %s" % (k,v)
        print "\n \n"

        ProxyRequest.process(self)

class HTTPProxy(Proxy):

    requestFactory = HTTPProxyRequest


factory = http.HTTPFactory()
factory.protocol = HTTPProxy

reactor.listenSSL(8001, factory)
reactor.run()

As this code demonstrates, for the sake of example for now I am just printing out whatever is going through the connection. Is it possible to handle HTTPS with the same classes? If not, how should I go about implementing such a thing?

A: 

I'm not sure about twisted, but I want to warn you that if you implement a HTTPS proxy, a web browser will expect the server's SSL certificate to match the domain name in the URL (address bar). The web browser will issue security warnings otherwise.

There are ways around this, such as generating certificates on the fly, but you'd need the root certificate to be trusted on the browser.

Marcus Adams
This would be true for a reverse application-layer proxy, or a transparent proxy. The question doesn't specify what kind of proxy he wants for what purpose.
MattH
To Clarify: To start I would just like to write an HTTPS proxy that can merely listen to all traffic going over the connection and print/log it. Example:Client --> makes request to SSL encrypted site --> proxy intercepts --> sends onto destinationSSL Server --> response --> proxy intercepts and reads --> client
themaestro
@MattH, the example clearly shows an application layer proxy and not a reverse one. You can call this a transparent proxy or not, depending on how the OP uses it.
Marcus Adams
@Marcus, sure the example shows an application layer HTTP proxy, but as we both seem to know that has little to do with proxying HTTPS.
MattH
Then might I ask how to proxy using HTTPS?
themaestro
+1  A: 

If you want to connect to an HTTPS website via an HTTP proxy, you need to use the CONNECT HTTP verb (because that's how a proxy works for HTTPS). In this case, the proxy server simply connects to the target server and relays whatever is sent by the server back to the client's socket (and vice versa). There's no caching involved in this case (but you might be able to log the hosts you're connecting to).

The exchange will look like this (client to proxy):

C->P: CONNECT target.host:443 HTTP/1.0
C->P:

P->C: 200 OK
P->C: 

After this, the proxy simply opens a plain socket to the target server (no HTTP or SSL/TLS yet) and relays everything between the initial client and the target server (including the TLS handshake that the client initiates). The client upgrades the existing socket it has to the proxy to use TLS/SSL (by starting the SSL/TLS handshake). Once the client has read the '200' status line, as far as the client is concerned, it's as if it had made the connection to the target server directly.

You might find out more regarding the client's behaviour in this httpclient library (which also checks the certificate).

Bruno