views:

868

answers:

6

We actually have burned static/archived copies of our asp.net websites for customers many times. We have used WebZip until now but we have had endless problems with crashes, downloaded pages not being re-linked correctly, etc.

We basically need an application that crawls and downloads static copies of everything on our asp.net website (pages, images, documents, css, etc) and then processes the downloaded pages so that they can be browsed locally without an internet connection (get rid of absolute urls in links, etc). The more idiot proof the better. This seems like a pretty common and (relatively) simple process but I have tried a few other applications and have been really unimpressed

Does anyone have archive software they would recommend? Does anyone have a really simple process they would share? Thanks in advance, Shane

A: 

I just use: wget -m <url>.

Aram Verstegen
+5  A: 

You could use wget:

wget -m -k -K -E http://url/of/web/site
chuckg
From the --help, I can see what the rest do, but what do the flags K (capital) and E do?
matthews
Don't forget the -p switch to get images and other embedded objects, too. (-E is for converting to html extension. -K is to backup the original file with extension .orig)
migu
+2  A: 

I use Blue Crab on OSX and WebCopier on Windows.

Syntax
+1  A: 

wget -r -k

... and investigate the rest of the options. I hope you've followed these guidelines:http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html so all your resources are safe with GET requests.

Joel Hoffman
+3  A: 

In Windows, you can look at HTTrack. It's very configurable allowing you to set the speed of the downloads. But you can just point it at a website and run it too with no configuration at all.

In my experience it's been a really good tool and works well. Some of the things I like about HTTrack are:

  • Open Source license
  • Resumes stopped downloads
  • Can update an existing archive
  • You can configure it to be non-aggressive when it downloads so it doesn't waste your bandwidth and the bandwidth of the site.
Jesse Dearing
Thanks for the recommendation - it worked great for me - just what i was looking for
jskunkle
httrack also exists for linux.
dusoft
A: 

I've been using HTTrack for several years now. It handles all of the inter-page linking, etc. just fine. My only complaint is that I haven't found a good way to keep it limited to a sub-site very well. For instance, if there is a site www.foo.com/steve that I want to archive, it will likely follow links to www.foo.com/rowe and archive that too. Otherwise it's great. Highly configurable and reliable.

Steve Rowe