tags:

views:

82

answers:

2

I am not too familiar with python and have to write a script to perform a host of functions. Basically the module i still need is how to check a website code for matching links provided beforehand.

+3  A: 

Generally, you use urllib, urllib2 (htmllib etc) for programming web in Python. you could also use mechanize, curl etc. Then for processing HTML and getting links, you would want to use parsers like BeautifulSoup.

ghostdog74
+2  A: 

Matching links what? Their HREF attribute? The link display text? Perhaps something like:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

doc = urllib2.urlopen("http://somesite.com").read()
links = SoupStrainer('a', href=re.compile(r'^test'))
soup = [str(elm) for elm in BeautifulSoup(doc, parseOnlyThese=links)]
for elm in soup:
    print elm

That will grab the HTML content of somesite.com and then parse it using BeautifulSoup, looking only for links whose HREF attribute starts with "test". It then builds a list of these links and prints them out.

You can modify this to do anything using the documentation.

Nick Presta