views:

77

answers:

4

I need to automate something like this:

  1. Open an URL
  2. Wait until the page is fully loaded
  3. Save COMPLETE page as... (I can provide a name).

I saw https://developer.mozilla.org/en/Command_Line_Options but I can't find an option to invoke the command "save page as... (in mode Web page complete)". So I can have all css, js, xml and related files needed to display the page.

I know some Python that I could use it if I find a way to "talk" to Firefox. The webbrowser module is not help here since it doesn't allow to save a page: http://docs.python.org/library/webbrowser.html

I am opened to any kind of solution.

Platform: Linux, but I could use another if there is no other way.

Important: I can't just retrieve the HTML given by the web server, since I need all css, js, images and files that are used to see to page as rendered by the browser. For example an image may be not linked in the HTML but referenced by a js that is executed when the page is rendered. The only way I think that I could retrieve this image is by executing the page as if I were the browser and then get all files from the resulting page (and not the original page).

+4  A: 

Maybe something from the Selenium collection of tools works for you.

Selenium IDE is an integrated development environment for Selenium scripts. It is implemented as a Firefox extension, and allows you to record, edit, and debug tests. Selenium IDE includes the entire Selenium Core, allowing you to easily and quickly record and play back tests in the actual environment that they will run.

Pekka
A: 

If you're trying to save some URL to a file in Python, a good answer could be urllib.urlretrieve

http://docs.python.org/library/urllib.html

Gonzalo
A: 

You can use bash and wget.

Asar
A: 

Other alternatives to Selenium can be used

And some thing written on top of that

pyfunc