tags:

views:

124

answers:

2

I have an array of strings in python which each string in the array looking something like this:

<r n="Foo Bar" t="5" s="10" l="25"/>

I have been searching around for a while and the best thing I could find is attempting to modify a HTML hyperlink regex into something that will fit my needs.

But not really knowing much regex stuff I havent had anything work yet. This is what I have so far.

string = '<r n="Foo Bar" t="5" s="10" l="25"/>'
print re.split("<r\s+n=(?:\"(^\"]+)\").*?/>", string)

What would be the best way to extract the values of n, t, s, and l from that string?

+6  A: 

This will get you most of the way there:

>>> print re.findall(r'(\w+)="(.*?)"', string)
[('n', 'Foo Bar'), ('t', '5'), ('s', '10'), ('l', '25')]

re.split and re.findall are complementary.

Every time your thought process begins with "I want each item that looks like X", then you should use re.findall. When it starts with "I want the data between and surrounding each X", use re.split.

Clint
Worked flawlessly, thanks.
AdamB
+6  A: 
<r n="Foo Bar" t="5" s="10" l="25"/>

That source looks like XML, so the "the best way" would be to use an XML parsing module.. If it's not exactly XML, BeautifulSoup (or rather, the BeautifulSoup.BeautifulStoneSoup module) may work best, as it's good at dealing with possibly-invalid XML (or things that "aren't quite XML"):

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup("""<r n="Foo Bar" t="5" s="10" l="25"/>""")

# grab the "r" element (You could also use soup.findAll("r") if there are multiple
>>> soup.find("r")
<r n="Foo Bar" t="5" s="10" l="25"></r>

# get a specific attribute
>>> soup.find("r")['n']
u'Foo Bar'
>>> soup.find("r")['t']
u'5'

# Get all attributes, or turn them into a regular dictionary
>>> soup.find("r").attrs
[(u'n', u'Foo Bar'), (u't', u'5'), (u's', u'10'), (u'l', u'25')]
>>> dict(soup.find("r").attrs)
{u's': u'10', u'l': u'25', u't': u'5', u'n': u'Foo Bar'}
dbr