tags:

views:

78

answers:

5

How to get value a tag <a> and <b>? Use regular expressions.

<a href="/model.xml?hid=90971&amp;modelid=4636873&amp;show-uid=678650012772883921" class="b-offers__name"><b>LG</b> X110</a>

Ie I want get

LG X110
+7  A: 

You don't.

Regular Expressions are not well suited to deal with the nested structure of HTML. Use an HTML parser instead.

Jens
+1  A: 

Try this...

<a.*<b>(.*)</b>(.*)</a>

$1 and $2 should be what you want, or whatever means Python has for printing captured groups.

Adrian Regan
Python, not PHP...
msanders
+1  A: 

Your question was very hard to understand, but from the given output example it looks like you want to strip everything within < and > from the input text. That can be done like so:

import re
input_text = '<a bob>i <b>c</b></a>'
output_text = re.sub('<[^>]*>', '', input_text)
print output_text

Which gives you:

i c

If that is not what you want, please clarify.

Please note that the regular expression approach for parsing XML is very brittle. For instance, the above example would break on the input <a name="b>c">hey</a>. (> is a valid character in a attribute value: see XML specs)

Deestan
A: 

+1 for Jens's answer. lxml is a good library you can use to actually parse this in a robust fashion. If you'd prefer something in the standard library, you can use sax, dom or elementree.

Noufal Ibrahim
+5  A: 

Don't use regular expressions for parsing HTML. Use an HTML parser like BeautifulSoup. Just look how easy it is:

from BeautifulSoup import BeautifulSoup
html = r'<a href="removed because it was too long"><b>LG</b> X110</a>'
soup = BeautifulSoup(html)
print ''.join(soup.findAll(text=True))
# LG X110
DzinX