Try xml parser from xml.sax
package in standard library.
from xml.sax import parse
from xml.sax.handler import ContentHandler
from sys import argv
class Handler(ContentHandler):
def startElementNS(self, name, qname, attrs):
self.startElement(name, attrs)
def endElementNs(self, name, qname):
self.endElement(name, attrs)
def startElement(self, name, qname, attrs):
... do whatever you like on tag start...
def characters(self, content):
... on tag content ...
def endElement(self, name):
... on tag closing ...
if __name__ == "__main__":
parse(argv[1], Handler())
Here I assumed argv[1] is a path to the file you'd like to parse. (first argument to parse() function is filename or stream). It is easy to convert it to for loop: just grab all the information you need in the methods above and push them into some list or stack. Iterate over it once you have finished parsing.