views:

80

answers:

2

Hi,

I am looking for a way to, give a URL, get the source of a webpage back after the JavaScript has been run on it. For example:

I have a webpage with a . On loading the page, some JavaScript populates the div. Viewing the source of the page through a browser will not give the information which is within the div.

As far as I know, in order for the browser to render the page the div must have been filled with (X|D)HTML which would mean that the source of the page after being rendered is still just nested markup, so theoretically there should be a "final" version of the page source.

I have considered using a rendering engine like WebKit or Gecko and somehow adapting these to do this, however this is a fairly large task and I don't really want to duplicate something which has already been done. Does anyone know of a way of performing this task.

Regards.

Update: I am aiming to use Selenium (as mentioned in the comments to the accepted answer) to do this automatically for several pages. My project is a web spider which by design needs to target a number of pages in which the content I am aiming to reach is not available until after the JavaScript has populated everything.

A: 

Within Firefox you can get the final rendered DIV by waiting the browser to finish rendering, then pressing ctrl-A to select all content on the page and finally selecting "Show selection source" from the right-click menu.

This shows you the manipulated/populated DOM-code of the page.

Kosi2801
Thank you. This has the information I am looking for. Do you know if there is a way in which I could automate this and output to (for example) a text file? I want to parse various things from this source on several pages, and manually copying them into files to parse is unrealistically time consuming.
C A
You could have a look at the Selenium Web Test Framework (http://seleniumhq.org) which is for automating Web Tests. I don't know if it contains something to write out the DOM source but I think the chances are not too bad.
Kosi2801
I've been playing around with Selenium and it looks like something which I can use to do what I'm looking for. It's not quite designed for the task so I'll have to work around some of it, but it is certainly capable doing what I need it to. Thanks.
C A
+1  A: 

Such addons for Firefox as the WebDev toolbar, or Firebug have options like 'View generated source'.

As far as timing it goes, just about the only option you have is to have a snippet of javascript code. You could set a start-time as soon as is possible on the page-load, and check again when the page is completed (either for dom-ready or page completely downloaded). It's going to be highly variable however, and if you are trying to time it in order to improve the speed (which is good to know, and to do) - just getting Firebug + Yslow would be far more useful.

Alister Bulman
Both of these should be part of a developer's arsenal. +1
Paolo Bergantino