ansaurus

Question

Extracting a tag value in BeautifulSoup when uanble to match by position or attributes.

Answer 1

A:

BeautifulSoup is kind of dead, since SGMLParser is deprecated. I suggest you use the better lxml library -- It even has xpath support!!

from lxml import html

text = '''
<span style="font-family: arial;">
    <span style="font-weight: bold;">Artist:</span>M.I.A.<br>
</span>
'''

doc = html.fromstring(text)
print ''.join(doc.xpath("//span/span[text()='Artist:']/../text()"))

This xpath expression means "find the span tag which is inside another span tag and contains the text 'Artist:', and grab all the text of the parent containing tag". It correctly prints M.I.A. as one would expect.

nosklo 2010-08-06 11:27:46

ansaurus

tags:

views:

answers:

Extracting a tag value in BeautifulSoup when uanble to match by position or attributes.

related questions