ansaurus

Question

Using BeautifulSoup's findAll to search html element's innerText to get same result as searching attributes?

Answer 1

+1 A:

You don't get back the text. You get a NavigableString with the text. That object has methods to go to the parent, etc.

from BeautifulSoup import BeautifulSoup
import re

soup = BeautifulSoup('<html><p>foo</p></html>')

r = soup.findAll('p', text=re.compile('foo'))

print r[0].parent

prints

<p>foo</p>

nosklo 2010-04-05 19:14:33

Super thanks. Basically to get what I wanted I just had to map the results, like so:comments = map(lambda x:x.parent,soup.findAll('a',text = re.compile(".discuss|comment.")))

Jack 2010-04-05 19:21:56

maplambda is ugly, so I'd just do `[s.parent for s in soup.findAll(...)]`

nosklo 2010-04-05 19:36:19

@Jack: also worth checking is `lxml.html` - I'm prefering it over `BeautifulSoup` since the latter is not being maintained anymore and is slower.

nosklo 2010-04-08 11:39:12

ansaurus

tags:

views:

answers:

Using BeautifulSoup's findAll to search html element's innerText to get same result as searching attributes?

related questions