Fetch certain .html files from web server

views:

answers:

Fetch certain .html files from web server

I would like to fetch certain .html files from a web server. My intention is to fetch .html files from a web site (http://www.thetabworld.com/) that has a word "metallica" on file name. How is that possible using python? I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it.

+1 A:

You need to use urllib2 together with a HTML parser such as lxml or BeautifulSoup in order to extract the links from the retrieved pages in order to crawl the site.

Ignacio Vazquez-Abrams 2010-01-19 19:38:56

+1 A:

"I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it."

well if you don't know how to use urllib2, reading some docs would be a good start.

the following are excellent resources (with examples):

official python docs for urllib2
urllib2 - the missing manual
urllib2 cookbook
PMOTW - urllib2

Corey Goldberg 2010-01-19 20:34:53

RTFM is not a very helpful response

Steve McLeod 2010-09-30 16:43:26

steve, my answer gave 4 useful links to the best resources on urrlib2.. and was accepted by the OP. so, i would call it a "helpful response".

Corey Goldberg 2010-09-30 16:57:21

ansaurus

tags:

views:

answers:

Fetch certain .html files from web server

related questions