views:

224

answers:

3

Let's assume I browse a specific web page that uses JavaScript to update its view constantly (using Web 2.0 techniques to talk to their server to retrieve updates of data).

Now I like to run some code on my own computer that monitors the contents and alerts me if some specific data appears on the page, so that I could record that data, for instance.

I am looking for ways to accomplish that. Since it's a private project, I am flexible in the choices of my tools (I can program in C and REALbasic, and could manage a little JavaScript as well). The only thing out of my control is the page I want to monitor.

I would prefer a solution I can employ on Mac OS X, but Linux or Windows would be feasible, too.

First, I wonder if there are already solutions for this out there. Something like a user-scriptable web browser, for instance.

If that's not available, I wonder how to best approach this by programming it myself. E.g, can someone tell me if Apple's Webkit allows me to introspect a dynamically updating web page?

As a last resort, I guess I would have to insert my own javascript code into the viewed webpage (I could do that easily, I think, at time of loading the page over the internet), and then have that script run periodically, introspecting the page it's in. The only thing I don't know in this case is how to get it to communicate with the outside, i.e. my computer. I could certainly write an app that it could try talking to, but how could it at all access my computer resources to establish such a communication? As far as I understand the sandboxing of web pages, they cannot read/write local files or communicate with a socket on the computer they're running on, or can they?

So, any ideas are welcome, as long as they're clear of the concept that I have to let a browser or its engine render the page and run the page's Javascripts.

+1  A: 

This sounds like it could be pretty easy using Jetpack in Firefox.

You can create browser extensions using Javascript - it's still in alpha but looks to be workable (and awesome)...

Greg
From JetPack I've been led to GreaseMonkey. And on its wikipedia page I found suggestions for alternatives: http://en.wikipedia.org/wiki/Greasemonkey#Similar_software
Thomas Tempelmann
Ah yes I forgot about GreaseMonkey. Haven't tried any of the similar ones.
Greg
How are you going to run non-sandboxed code ("read/write local files or communicate with a socket") from Greasemonkey?
Matthew Flaschen
See my comment below about REALbasic - not a free tool, but since I have paid for it already, I'll happily go that way because it appears to be the simplest of all.
Thomas Tempelmann
+1  A: 

I agree you could definitely do this with a Firefox extension (I haven't used JetPack, and I don't know if it could handle this). Firefox extensions can communicate with arbitrary XPCOM components. So the extension would have a small JavaScript part to suck the data out of the DOM, then communicate with a C(++) XPCOM component to do anything else.

See Creating a C++ XPCOM component and Creating Custom Firefox Extensions with the Mozilla Build System

Matthew Flaschen
A: 

Actually, I just realized that the Monkeybread plugin for REALbasic offers all that I need, and in a clearly much easier way than it could be with Jetpack, even:

http://www.monkeybreadsoftware.de/pluginhelp/example-cocoa-domformfields.shtml

I can thus write my own Browser that fetches the webpages and then filters out the DOM data, even modifies it.

Thomas Tempelmann