views:

36

answers:

1

Starting from an Html input like this:

<p>
<a href="http://www.foo.com" rel="nofollow">this is foo</a>
<a href="http://www.bar.com" rel="nofollow">this is bar</a>
</p>

is it possible to modify the <a> node values ("this i foo" and "this is bar") adding the suffix "PARSED" to the value without recreating the all link?
The result need to be like this:

<p>
<a href="http://www.foo.com" rel="nofollow">this is foo_PARSED</a>
<a href="http://www.bar.com" rel="nofollow">this is bar_PARSED</a>
</p>

And code should be something like:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
for link_tag in soup.findAll('a'):
    link_tag.string = link_tag.string + '_PARSED' #This obviously does not work
+2  A: 

If I understand you correctly then you're nearly there. Change your code to

for link_tag in soup.findAll('a'):
    link_tag.string = link_tag.string + '_PARSED'
html_out = soup.renderContents()

If we print out html_out we get:

>>> print html_out
<p>
<a href="http://www.foo.com" rel="nofollow">this is foo_PARSED</a>
<a href="http://www.bar.com" rel="nofollow">this is bar_PARSED</a>
</p>

which I think is what you wanted.

Justin Peel
link_tag.string = link_tag.string + '_PARSED' does not work.
systempuntoout
@systempuntoout, are you sure? It works beautifully for me. I'm using version 3.0.8.1.
Justin Peel
I was using 3.07a downloaded with DarwinPort. With 3.08 works like a charm :) thanks.
systempuntoout
That's a relief. Glad it works!
Justin Peel