ansaurus

Question

Answer 1

+4 A:

BeautifulSoup's search mechanisms accept a callable, which the docs appear to recommend for your case: "If you need to impose complex or interlocking restrictions on a tag's attributes, pass in a callable object for name,...". (ok... they're talking about attributes specifically, but the advice reflects an underlying spirit to the BeautifulSoup API).

If you want a one-liner:

soup.findAll(lambda tag: tag.name == 'a' and \
tag.findParent('strong', 'sans') and \
tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))

I've used a lambda in this example, but in practice you may want to define a callable function if you have multiple chained requirements as this lambda has to make two findParent('strong', 'sans') calls to avoid raising an exception if an <a> tag has no strong parent. Using a proper function, you could make the test more efficient.

Jarret Hardie 2009-04-01 17:15:12

Answer 2

A:

>>> BeautifulSoup.BeautifulSoup("""<html><td width="50%">
...     <strong class="sans"><a href="http:/website">Site</a></strong> <br />
... </html>""" )
<html><td width="50%">
<strong class="sans"><a href="http:/website">Site</a></strong> <br />
</td></html>
>>> [ a for a in strong.findAll("a") 
            for strong in tr.findAll("strong", attrs = {"class": "sans"}) 
                for tr in soup.findAll("td", width = "50%")]
[<a href="http:/website">Site</a>]

Aaron Maenpaa 2009-04-01 17:19:33

ansaurus

tags:

views:

answers:

Complex Beautiful Soup query

related questions