tags:

views:

152

answers:

3

Hello, we are tasked with basically emulating a browser to fetch webpages, looking to automate tests on different web pages. This will be used for (ideally) console-ish applications that run in the background and generate reports.

We tried going with .NET and the WatiN library, but it was built on a Marshalled IE, and so it lacked many features that we hacked in with calls to unmanaged native code, but at the end of the day IE is not thread safe nor process safe, and many of the needed features could only be implemented by changing registry values and it was just terribly unflexible.

  • Proxy support
  • JavaScript support- we have to be able to parse the actual DOM after any javascript has executed (and hopefully an event is raised to handle any ajax calls)
  • Ability to save entire contents of page including images FROM THE loaded page's CACHE to a separate location
  • ability to clear cookies/cache, get the cookies/cache, etc.
  • Ability to set headers and alter post data for any browser call
  • Process and/or thread safe would be ideal
  • And for the love of drogs, an API that isn't completely cryptic

Languages acceptable C++, C#, Python, anything that can be a simple little background application that is somewhat bearable and doesn't have a completely "untraditional" syntax like Ruby.

From my own research, and believe me I am terrible at google searches, I have heard good things about WebKit... would the Qt module QtWebKit handle all these features?

+1  A: 

I know you mentioned you don't like Ruby syntax (neither do I), but I just have to chime in and say that Watir is probably the best thing out there for what you are trying to do.

EDIT: There appears to be a Java counter-part called Watij

Josh Stodola
Yes that's what we saw but we are already behind deadline and can't afford to learn the syntax, not to mention deployment issues with the client. We need a somewhat drop-in solution ready today overtime status. I guess Ruby can probably be compiled into a standalone exe, and we may end up having to go with Watir if there's nothing else.
Bad Man
@Sean Understood. See edit.
Josh Stodola
"Currently Watij supports automating Internet Explorer on Windows only. Future plans are in place to support others like Mozilla." - Yea, building off IE is just doomed to fail! >.<
Bad Man
@Sean Ha, I did not notice that
Josh Stodola
A: 

You might try one of these:

http://code.google.com/p/spynner/

http://code.google.com/p/pywebkitgtk/

Forest
A: 

I've only been digging into this recently myself, so I couldn't say that this does everything you've listed, but check out GeckoFx.

From the site: GeckoFX is an open-source component which makes it easy to embed Mozilla Gecko (Firefox) into any .NET Windows Forms application. Written in clean, fully commented C#, GeckoFX is the perfect replacement for the default Internet Explorer-based WebBrowser control.

As for my own impressions: it has blown away the default .NET WebBrowser in both performance and stability.

Randal