views:

52

answers:

2

I'm currently reformatting some HTML pages with BeautifulSoup, and I ran into bit of a problem.

My problem is that the original HTML has things like this:

<li><p>stff</p></li>

and

<li><div><p>Stuff</p></div></li>

as well as

<li><div><p><strong>stff</strong></p></div><li>

With BeautifulSoup I hope to eliminate the div and the p tags, if they exists, but keep the strong tag.

I'm looking through the beautiful soup documentation and couldn't find any. Ideas?

Thanks.

A: 

What you want to do can be done using replaceWith. You have to duplicate the element you want to use as the replacement, and then feed that as the argument to replaceWith. The documentation for replaceWith is pretty clear on how to do this.

jathanism
A: 

You can write your own function to strip tags:

import re

def strip_tags(string):
    return re.sub(r'<.*?>', '', string)

strip_tags("<li><div><p><strong>stff</strong></p></div><li>")
'stff'
suzanshakya
yeah but i actually want the <strong> there. Also after looking through a couple random page, there's only div and p to worry about.
ultimatebuster