tags:

views:

79

answers:

1

Hi all, I was wondering what's the best way to save all the files that are retrieved when Selenium visits a site. In other words, when Selenium visits http://www.google.com I want to save the HTML, JavaScript (including scripts referenced in src tags), images, and potentially content contained in iframes. How can this be done?

I know getHTMLSource() will return the HTML content in the body of the main frame, but how can this be extended to download the complete set of files necessary to render that page again. Thanks in advance!

A: 

Selenium isn't the designed for this, you could either:

  1. Use getHtmlSource and parse the resulting HTML for references to external files, which you can then download and store outside of Selenium.
  2. Use something other than Selenium to download and store an offline version of a website - I'm sure there are plenty of tools that could do this if you do a search. For example WGet can perform a recursive download (http://en.wikipedia.org/wiki/Wget#Recursive_download)

Is there any reason you want to use Selenium? Is this part of your testing strategy or are you just wanting to find a tool that will create an offline copy of a page?

Dave Hunt
The reason why we want to use Selenium is because it parses JavaScript which is essential to reconstruct an entire page (including ad traffic).
Rick