ansaurus

Question

BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are

Answer 1

+1 A:

Short answer: soup.findAll(text=True)

This has already been answered, here on StackOverflow and in the BeautifulSoup documentation.

UPDATE:

To clarify, a working piece of code:

>>> txt = """\
<p>Red</p>
<p><i>Blue</i></p>
<p>Yellow</p>
<p>Light <b>green</b></p>
"""
>>> import BeautifulSoup
>>> BeautifulSoup.__version__
'3.0.7a'
>>> soup = BeautifulSoup.BeautifulSoup(txt)
>>> for node in soup.findAll('p'):
    print ''.join(node.findAll(text=True))

Red
Blue
Yellow
Light green

taleinat 2010-06-02 11:27:40

Thanks! I'd looked at both those, but failed to extract the important bit of the StackOverflow question - and I find the BeautifulSoup documentation is only really useful if you already know what you're doing. Or maybe I just need more coffee.

AP257 2010-06-02 11:45:37

Actually, it doesn't work (see update).

AP257 2010-06-02 11:50:37

print ''.join(soup.findAll(text=True))

Drew Sears 2010-06-02 18:26:51

I have added a working code example to illustrate how to use `.findAll(text=True)` to get what you want.

taleinat 2010-06-04 13:24:58

ansaurus

tags:

views:

answers:

BeautifulSoup: just get inside of a tag, no matter how many enclosing tags there are

related questions