tags:

views:

409

answers:

3

I am building a web application crawler that's meant not only to find all the links or pages in a web application, but also perform all the allowed actions in the app (such as pushing buttons, filling forms, notice changes in the DOM even if they did not trigger a request etc.)

Basically, this is a kind of "browser simulator".

I find WebKit a good option to implement my crawler, since it has all the needed technology (Javascript engine, parsers, DOM manipulation, etc.) but it seems kind of an overkill being a fully featured browser.

Is there any toolkit you know that can provide the above functionality?

+2  A: 

http://www.mozilla.org/rhino/

mcintyre321
A: 

I use webkit through PyQt to parse the JavaScript and then Mechanize to interact with it.

Plumo
A: 

if you're on mac, try fake app

http://www.fakeapp.com

mt3