views:

69

answers:

2

I am learning network programming using twisted 10 in python. In below code is there any way to detect HTTP Request when data recieved? also retrieve Domain name, Sub Domain, Port values from this? Discard it if its not http data?

from twisted.internet import stdio, reactor, protocol

from twisted.protocols import basic

import re



class DataForwardingProtocol(protocol.Protocol):

    def _ _init_ _(self):

        self.output = None

        self.normalizeNewlines = False



    def dataReceived(self, data):

        if self.normalizeNewlines:

            data = re.sub(r"(\r\n|\n)", "\r\n", data)

        if self.output:

            self.output.write(data)



class StdioProxyProtocol(DataForwardingProtocol):

    def connectionMade(self):

        inputForwarder = DataForwardingProtocol( )

        inputForwarder.output = self.transport

        inputForwarder.normalizeNewlines = True

        stdioWrapper = stdio.StandardIO(inputForwarder)

        self.output = stdioWrapper

        print "Connected to server.  Press ctrl-C to close connection."



class StdioProxyFactory(protocol.ClientFactory):

    protocol = StdioProxyProtocol



    def clientConnectionLost(self, transport, reason):

        reactor.stop( )



    def clientConnectionFailed(self, transport, reason):

        print reason.getErrorMessage( )

        reactor.stop( )



if __name__ == '_ _main_ _':

    import sys

    if not len(sys.argv) == 3:

        print "Usage: %s host port" % _ _file_ _

        sys.exit(1)



    reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))

    reactor.run( )
+1  A: 

protocol.dataReceived, which you're overriding, is too low-level to serve for the purpose without smart buffering that you're not doing -- per the docs I just quoted,

Called whenever data is received.

Use this method to translate to a higher-level message. Usually, some callback will be made upon the receipt of each complete protocol message.

Parameters

data

a string of indeterminate length. Please keep in mind that you will probably need to buffer some data, as partial (or multiple) protocol messages may be received! I recommend that unit tests for protocols call through to this method with differing chunk sizes, down to one byte at a time.

You appear to be completely ignoring this crucial part of the docs.

You could instead use LineReceiver.lineReceived (inheriting from protocols.basic.LineReceiver, of course) to take advantage of the fact that HTTP requests come in "lines" -- you'll still need to join up headers that are being sent as multiple lines, since as this tutorial says:

Header lines beginning with space or tab are actually part of the previous header line, folded into multiple lines for easy reading.

Once you have a nicely formatted/parsed response (consider studying twisted.web's sources so see one way it could be done),

retrieve Domain name, Sub Domain, Port values from this?

now the Host header (cfr the RFC section 14.23) is the one containing this info.

Alex Martelli
thanks alex for reply. Your answer is very much useful for a newbie like me.I will get in to it :)
jasimmk
no problem, I actually spent a good amount of time struggling with a similar problem and getting the twisted HTTP proxy working myself. Once you figure it out it's extremely slick.
themaestro
+1  A: 

Just based on what you seems to be attempting, I think the following would be the path of least resistance: http://twistedmatrix.com/documents/10.0.0/api/twisted.web.proxy.html

That's the twisted class for building an HTTP Proxy. It will let you intercept the requests, look at the destination and look at the sender. You can also look at all the headers and the content going back and forth. You seem to be trying to re-write the HTTP Protocol and Proxy class that twisted has already provided for you. I hope this helps.

themaestro