views:

664

answers:

4

I need to write some scripts that access some websites. A script from command line would get some pages, post some forms, screen scrap some information etc.

It cannot really be a library "browser" like libwww-perl because some steps might require user interactions (captchas, ajax-only forms, any interaction surprises etc).

The most practical way I can think of would be remotely opening a tab in Firefox, and injecting Javascript into it, something a bit like what Greasemonkey and Selenium do. It doesn't necessarily have to be for Firefox, can be a different browser if that's easier.

So what would be the best way to do that?

+1  A: 

I'm not sure what the "best" way to do it would be, but one possibility would be to use AppleScript for the job. Firefox, however, doesn't have extensive scripting capabilities—if you are willing to use Safari, there is an AppleScript command available to inject JavaScript code into a document (the do JavaScript command—look it up in Safari's scripting dictionary, available from within Script Editor).

Also, in order to run AppleScripts from the command line, use osascript:

osascript path/to/script.scpt
htw
Can I send Applescrit commands from some more usual language like Ruby, Perl, or Python?
taw
Sure—as long as you can execute system commands from within your language of choice. For example, in Python, you could use something like:os.system('osascript -e "<command 1>" -e "<command 2>" -e "<and so on…>"')
htw
A: 

Have you considered Selenium Remote Control? I've automated browser interaction using the tool before and it works very well, providing a lot of flexibility

Depending on your exact needs, you might be able to leverage the Selenium IDE which is an easy to use Firefox plugin that allows easy scripting.

Alex B
A: 

You can use XPCOM to extend Firefox in every way imaginable. You could write some kind of interface that connects with another process maybe.

apphacker
A: 

To write srcripts on OS X there are two ways I would recommend, and both of them are in ruby. The first is Watir which is an automated testing framework that will control both firefox and safari on Mac os x.

Another, prehaps better way for screen scraping would be to use hpricot which is a html parser that is really easy to use.

In the background Watir uses JSSh - a TCP/IP JavaScript Shell Server for Firefox to do this is. JSSH allows you you control the browser from a telnet session.

Whichever way you go, if ther eare catchpa's they will stop you though. It's sort of the whole point of them :-)

Bruce McLeod