views:

235

answers:

2

I am working on a server application which receives data over a TCP socket in an XMPP-like XML format, i.e. every child of the <root> element essentially represents one separate request (stanza). The connection is closed as soon as </root> is received. I do know that I must use a stream parser like SAX, somehow. Though, for convenience, I'd prefer to have a tree-like interface to access each stanza's child elements. (The data sent with every request is not large so I think it makes sense to read each stanza as a whole.)

What's the best way to realize that in Python (preferably v3)?

This is the code I'd like to build it in. Feel free to point me in a totally different direction to solve this issue.

import socketserver
import settings

class MyServer(socketserver.ThreadingMixIn, socketserver.TCPServer):
    pass

class MyRequestHandler(socketserver.StreamRequestHandler):
    def handle(self):
        pass

if __name__ == '__main__':
    server = MyServer((settings.host, settings.port), MyRequestHandler)
    server.serve_forever()
+1  A: 

What we did for Skates is that we use a Sax parser to build the stream, but use this parser to build a whole document for each stanza received.

Julien Genestoux
+2  A: 

You'll want to use a push based parser that emits SAX events. Basically you want a parser that you can call pushChunk(data) with a partial bit of data, and have it an event handler for the first-level child end tag event that generates your stanzas. That can then be sent to application processing logic.

If you want to see an example of this, here is the expat parser for libstrophe, an XMPP client library I wrote: http://github.com/metajack/libstrophe/blob/master/src/parser_expat.c

Building a whole document for each stanza is quite expensive. It is possible to implement this with a single parser instance, as opposed to continually making new document parsers for each stanza.

If you need a working Python version, you can probably use or pull out the code from Twisted Words (twisted.words.xish I believe).

metajack
Another trick is to use a single element pointer as the stack for your current position. When you get a new element event, you create an element in your dom. If the stack is not null, you add this element as a child to the stack element, and set the stack pointer to the new element. When you get an end element event, you set the stack pointer to the parent of the current stack pointer. If the stack pointer is nil at the end of this operation, you have a stanza. Note: this is what Jack's code linked to above more-or-less does.
Joe Hildebrand
Just in case anyone needs a Python solution for this one: http://stackoverflow.com/questions/1459648/non-blocking-method-for-parsing-streaming-xml-in-python (the post marked as accepted answer).
codethief