tags:

views:

8

answers:

1

I woule like to be able to run a script (or something) that will "download" a certain webpage (html) and all of its attachements (word docs) so that I can keep and operate a private collection.

Here is the story... There is this site that I use a lot for research. On this site there are many html pages that contain text and download links to documents (.pdf's and .docs). There is a threat that the owner (us gov) is going to 'privatize' the information, which I think is bogus. There is however this threat. I would like to be able to extract all the html text and copies of all the attachments so that I can host my own (on my desktop) version of the data for personal use (just in case). Is there a simple way to do this?

Note: I do not have FTP access to this webserver, only access to the individual webpages and attachments.

A: 

There are a ton of programs out there that are able to do this. Doing a Google search for "offline browser" will yield quite a bunch of results. Although I wouldn't be too keen to re-invent the wheel, for a self-built solution I would probably use the cURL library for PHP, but then again, it depends on what programming languages you're familiar with.

Hope this helps.

FreekOne