ansaurus

Question

Trying to grab just absolute links from a webpage using BeautifulSoup

Answer 1

A:

Do you possibly have some <a> tags without href attributes? Internal link targets, perhaps?

Andrew Aylett 2010-03-23 17:25:14

Answer 2

A:

Please give us an idea of what you're parsing here - as Andrew points out, it seems likely that there are some anchor tags without associated hrefs.

If you really want to ignore them, you could wrap it in a try block and afterwards catch with

except KeyError: pass

But that has its own issues.

Sean M 2010-03-23 17:32:14

Answer 3

+2 A:

from BeautifulSoup import BeautifulSoup
import re
import urllib2

page = urllib2.urlopen("http://www.linkpages.com")
soup = BeautifulSoup(page)
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    print link

alex vasi 2010-03-23 17:38:05

ansaurus

tags:

views:

answers:

Trying to grab just absolute links from a webpage using BeautifulSoup

related questions