Some XML parsers allow incremental parsing of XML documents that is the parser can start working on the document without needing it to be fully loaded. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser: Element Tree
As you can see in the example below you can feed data to the parser bit by bit as you read it from your input source. The appropriate hook methods in your handler class will get called when various XML "events" happen (tag started, tag data read, tag ended) allowing you to process the data as the XML document is loaded:
from xml.etree.ElementTree import XMLTreeBuilder
class MyHandler(object):
def start(self, tag, attrib):
# Called for each opening tag.
print tag + " started"
def end(self, tag):
# Called for each closing tag.
print tag + " ended"
def data(self, data):
# Called when data is read from a tag
print data + " data read"
def close(self):
# Called when all data has been parsed.
print "All data read"
handler = MyHandler()
parser = XMLTreeBuilder(target=handler)
parser.feed(<sometag>)
parser.feed(<sometag-child-tag>text)
parser.feed(</sometag-child-tag>)
parser.feed(</sometag>)
parser.close()
In this example the handler would receive five events and print:
sometag started
sometag-child started
"text" data read
sometag-child ended
sometag ended
All data read