I would like to fetch certain .html files from a web server. My intention is to fetch .html files from a web site (http://www.thetabworld.com/) that has a word "metallica" on file name. How is that possible using python? I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it.
+1
A:
You need to use urllib2 together with a HTML parser such as lxml
or BeautifulSoup
in order to extract the links from the retrieved pages in order to crawl the site.
Ignacio Vazquez-Abrams
2010-01-19 19:38:56
+1
A:
"I have heard about urllib2 but as a python noob, I don't have a slightest idea how to use it."
well if you don't know how to use urllib2, reading some docs would be a good start.
the following are excellent resources (with examples):
official python docs for urllib2
urllib2 - the missing manual
urllib2 cookbook
PMOTW - urllib2
Corey Goldberg
2010-01-19 20:34:53
RTFM is not a very helpful response
Steve McLeod
2010-09-30 16:43:26
steve, my answer gave 4 useful links to the best resources on urrlib2.. and was accepted by the OP. so, i would call it a "helpful response".
Corey Goldberg
2010-09-30 16:57:21