ansaurus

Question

How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?

Answer 1

+3 A:

Use SoupStrainer,

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

# Find all links
links = SoupStrainer('a')
[tag for tag in BeautifulSoup(doc, parseOnlyThese=links)]

linkstodomain = SoupStrainer('a', href=re.compile('example.com/'))

Edit: Modified example from official doc.

viksit 2010-01-28 00:23:30

I would be more selective with the regex; that one could result in false positives.

Ignacio Vazquez-Abrams 2010-01-28 05:07:33

@Ignacio - right, this example has that caveat - the regex should obviously be as detailed as possible so as to avoid those false positives.

viksit 2010-01-28 07:57:39

ansaurus

tags:

views:

answers:

How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?

related questions