views:

400

answers:

7

Is there any programming libraries available that will parse an HTML document, execute JavaScript and then allow me to navigate the DOM?  This needs to be performed server side, not client side. Any language will do, but Java, PHP, or Ruby are preferred.

A: 

PHP has DOMDocument for navigating the DOM. I haven't heard of anything for executing JavaScript.

John Conde
+2  A: 

Java has support for javascript with Rhino, also look at this page for server side javascript solutions: http://en.wikipedia.org/wiki/Server-side_JavaScript

Pepijn
+3  A: 

Have you tried Bringing the Browser to the Server?

Luca Matteis
+1, beat me to it ..
Gaby
+1 Forgot about that one... On my Mac I'd just use Python's Applescript capability to run the JS straight on Safari though.
Pepijn
Links for updated community version: http://www.envjs.com/ and http://github.com/thatcher/env-js
machine elf
+1  A: 

in java: http://lobobrowser.org/cobra/java-html-parser.jsp
this is a a Javascript-aware, CSS-aware HTML parser
the most important feature in relation to your question: It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM.

Stefan De Boey
A: 

Start from this post and follow a links. Or just search for Rhino.

NilColor
Oh... same link as Luca Matteis gave... Sorry!
NilColor
+1  A: 

For Java, be sure to check out HtmlUnit and HttpUnit.

Kohsuke Kawaguchi
A: 

You should look at the Aptana Jaxer Server side javascript implementation, you can run any javascript and perform DOM manipulation in server side as you did in client side .It simply supports all JS libraries include JQuery, Mootools,etc in a server side.

Ramesh Vel