ansaurus

Question

Python regex parsing

Answer 1

+6 A:

This will get you most of the way there:

>>> print re.findall(r'(\w+)="(.*?)"', string)
[('n', 'Foo Bar'), ('t', '5'), ('s', '10'), ('l', '25')]

re.split and re.findall are complementary.

Every time your thought process begins with "I want each item that looks like X", then you should use re.findall. When it starts with "I want the data between and surrounding each X", use re.split.

Clint 2009-05-02 12:34:08

Worked flawlessly, thanks.

AdamB 2009-05-02 12:36:25

Answer 2

+6 A:

<r n="Foo Bar" t="5" s="10" l="25"/>

That source looks like XML, so the "the best way" would be to use an XML parsing module.. If it's not exactly XML, BeautifulSoup (or rather, the BeautifulSoup.BeautifulStoneSoup module) may work best, as it's good at dealing with possibly-invalid XML (or things that "aren't quite XML"):

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup("""<r n="Foo Bar" t="5" s="10" l="25"/>""")

# grab the "r" element (You could also use soup.findAll("r") if there are multiple
>>> soup.find("r")
<r n="Foo Bar" t="5" s="10" l="25"></r>

# get a specific attribute
>>> soup.find("r")['n']
u'Foo Bar'
>>> soup.find("r")['t']
u'5'

# Get all attributes, or turn them into a regular dictionary
>>> soup.find("r").attrs
[(u'n', u'Foo Bar'), (u't', u'5'), (u's', u'10'), (u'l', u'25')]
>>> dict(soup.find("r").attrs)
{u's': u'10', u'l': u'25', u't': u'5', u'n': u'Foo Bar'}

dbr 2009-05-02 13:32:54

ansaurus

tags:

views:

answers:

Python regex parsing

related questions