ansaurus

Question

input URL, output contents of "view page source", i.e. after javascript / etc, library or command-line

Answer 1

+1 A:

HTML Unit does execute javascript. Not sure if you can obtain the HTML code after DOM manipulation, but give it a try.

You could write a little Java program that fits your requirements, and execute it through command line like in your examples.

I haven't tried the below code, just had a look at the JavaDoc :

public static void main(String[] args) {

    String pageURL = args[1];

    WebClient webClient = new WebClient();
    HtmlPage page = webClient.getPage(pageURL);

    String pageContents = page.asText();

    // Save the resulting page to a file

}

EDIT :

Selenium (another web testing framework) can take page screenshots it seems.

Search for selenium.captureScreenshot.

mexique1 2010-05-26 17:50:59

+1 for the selenium

Konerak 2010-05-26 18:01:51

Answer 2

A:

You can use IRobotSoft web scraper to automate this. The source code is in UpdatedPage variable. You only need to save the variable to a file.

It has a function CapturePage() to capture the web page to an image file too.

seagulf 2010-05-27 01:36:08

ansaurus

tags:

views:

answers:

input URL, output contents of "view page source", i.e. after javascript / etc, library or command-line

related questions