views:

479

answers:

3

Hello,

I need to be able to modify every single link in an HTML document. I know that I need to use the SoupStrainer but I'm not 100% positive on how to implement it. If someone could direct me to a good resource or provide a code example, it'd be very much appreciated.

Thanks.

+4  A: 

Maybe something like this would work? (I don't have a Python interpreter in front of me, unfortunately)

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Blah blah blah <a href="http://google.com"&gt;Google&lt;/a&gt;&lt;/p&gt;')
for a in soup.findAll('a')
  a.href = a.href.replace("google", "mysite")

result = str(soup)
Lusid
Thanks a lot. There were a few problems, but I think that's because you didn't have a chance to test. Works great. :-)
Evan Fosmark
+4  A: 
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Blah blah blah <a href="http://google.com"&gt;Google&lt;/a&gt;&lt;/p&gt;')
for a in soup.findAll('a'):
    a['href'] = a['href'].replace("google", "mysite")
print str(soup)

This is Lusid's solution, but since he didn't have a Python interpreter in front of him, he wasn't able to test it and it had a few errors. I just wanted to post the working condition. Thank's Lusid!

Evan Fosmark
You probably want to check the crappy-HTML edge case where the a element you're testing doesn't have an href.
Robert Rossney
@Robert, yes you're right. I'll be sure to do so. Thanks for the heads up.
Evan Fosmark
@Evan, glad I was able to at least help you get there. My Python is a tad bit on the rusty side. :)
Lusid
`soup.prettify()` is easier for human eyes than `str(soup)`.
J.F. Sebastian
Actually, that's a pretty common situation for named anchor tags: <a name='foo'></a>
Peter Rowell
So much information passes through my brain. So little of it stays around.
Robert Rossney
A: 

Hi Guys ...... i am just writing a program that crawls a website in python. ...
i am not that good with python,please can any one write a simple code that just get the links on a website and store them in an array or something .............. ?

Please

Adejumo Magbagbeola