views:

297

answers:

1

For instance if I am searching by an element's attribute like id:

soup.findAll('span',{'id':re.compile("^score_")})

I get back a list of the whole span element that matches (which I like).

But if I try to search by the innerText of the html element like this:

soup.findAll('a',text = re.compile("discuss|comment")) 

I get back only the innerText part of element back that matches instead of the whole element with tags and attributes like I would above.

Is this possible to do with out finding the match and then getting it's parent?

Thanks.

+1  A: 

You don't get back the text. You get a NavigableString with the text. That object has methods to go to the parent, etc.

from BeautifulSoup import BeautifulSoup
import re

soup = BeautifulSoup('<html><p>foo</p></html>')

r = soup.findAll('p', text=re.compile('foo'))

print r[0].parent

prints

<p>foo</p>
nosklo
Super thanks. Basically to get what I wanted I just had to map the results, like so:comments = map(lambda x:x.parent,soup.findAll('a',text = re.compile(".discuss|comment.")))
Jack
maplambda is ugly, so I'd just do `[s.parent for s in soup.findAll(...)]`
nosklo
@Jack: also worth checking is `lxml.html` - I'm prefering it over `BeautifulSoup` since the latter is not being maintained anymore and is slower.
nosklo