ansaurus

Question

Answer 1

+2 A:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

This is an XML document. Please, reconsider an XML parser. It will be more robust and probably take you less time in the end, even if it is more code.

Borealid 2010-07-13 04:44:09

Could you provide an example on how to use an xml parser in python? I am in the same situation than with regex.

jahmax 2010-07-13 16:53:44

@jahmax: I think Marcelo Cantos, above, has done a solid job of showing a DOM-style XML parser operating in Python.

Borealid 2010-07-13 17:30:38

Yea I used that one.

jahmax 2010-07-13 19:39:16

Answer 2

+5 A:

You could use etree:

>>> from xml.etree.ElementTree import XMLParser
>>> x = XMLParser()
>>> x.feed('<toplevel><CompleteSuggestion><suggestion data=...')
>>> tree = x.close()
>>> [(e.find('suggestion').get('data'), int(e.find('num_queries').get('int')))
     for e in tree.findall('CompleteSuggestion')]
[('test internet speed', 31800000), ('test', 686000000), ...]

It is more code than a regex, but it also does more. Specifically, it will fetch the entire list of matches in one go, and unescape any weird stuff like double-quotes in the data attribute. It also won't get confused if additional elements start appearing in the XML.

Marcelo Cantos 2010-07-13 04:53:14

ansaurus

tags:

views:

answers:

How to regex in python?

related questions