ansaurus

Question

only parse a specific subtree of an XML file

Answer 1

+1 A:

I get the impression that iterparse is what you want. Looking at the section "Selective tag events" at http://codespeak.net/lxml/parsing.html it seems like that gives you what you desire:

context = etree.iterparse(xmlfile, tag="yourSubTree")
action, elem = context.next()
etree.iterwalk(elem, ...)...

Seems like XPath could also work but I'd guess that XPath reads in the whole tree before returning whereas I'd expect iterparse to only walk the tree until it has a match. It would be worth profiling the two approaches.

Brian Luft 2010-03-06 00:00:40

Answer 2

A:

Iterparse will still require parsing everything up to the subtree you want. It might be more efficient to extract the subtree before you feed it into the parser with a regular expression. You might want to try writing a sax parser. Sax is probably slower than lxml, but it won't use much memory, so in some cases it might be better.

mikerobi 2010-03-06 00:09:17

ansaurus

tags:

views:

answers:

only parse a specific subtree of an XML file

related questions