views:

230

answers:

1

Hi - You probably know that IE has this thing where you can save a web page, and it will automatically download the html file and all he image/css/js files that the html file uses.

Now there is one problem with this- the links in the html file are not changed. So if I download the html page of example.com, which has an < a href=/hi.html> the page that I downloaded with IE will have a link to C:\Documents and Settings...(path to the folder that the html file is in).

Is there a python library that will download an html page for me, with all the contents of it (images/js/css) too? If yes, is there a library that will also change the links for me?

Thanks!!

+5  A: 

Since you're mentioning IE specifically, I'm not sure if this is gonna be of any use to you, but on linux the easiest way to completely mirror a website is with the wget command.

wget --mirror --convert-links -w 1 http://www.example.com

Run man wget if you need more options.

Werner
Don't let Windows stop you from using wget: http://gnuwin32.sourceforge.net/
Tom