I'm working on some screen scraping software and have run into an issue with Beautiful Soup. I'm using python 2.4.3 and Beautiful Soup 3.0.7a.
I need to remove an <hr>
tag, but it can have many different attributes, so a simple replace() call won't cut it.
Given the following html:
<h1>foo</h1>
<h2><hr/>bar</h2>
And the following code:
soup = BeautifulSoup(string)
bad_tags = soup.findAll('hr');
[tag.extract() for tag in bad_tags]
for i in soup.findAll(['h1', 'h2']):
print i
print i.string
The output is:
<h1>foo</h1>
foo
<h2>bar</h2>
None
Am I misunderstanding the extract function, or is this a bug with Beautiful Soup?