views:

76

answers:

3

Hi folks,

My goal is to allow less experienced people to setup the required parameters needed to scrape some information from a website.

The idea is that a user enters an URL, after which this URL is loaded in a frame. The user should then be able to select text within this frame, which should give me enough information to scrape this information again when this specific text changes dynamically.

The question is, if it's even possible to detect what part of the source of an external site corresponds to the selection of the user in a frame?

If not, are there any alternatives ?

Thanks in advance.

Regards, Tom

+3  A: 

The short answer is no. If you don't control the content in the iframe, there's not much you can do to interact with it.

However, you could make a bookmarklet that does something like you're describing, or a browser plugin.

George Mandis
I will ask a follow up question on a later date. Thanks.
Tom
A: 

Have a look at iMacros. It provides browser addons for IE, Firefox and Chrome to record a web browsing sequence. The Firefox/Chrome addons are open-source/freeware. You could then use the "macro" created by this recorder as input for your screen scraping code (or even replay iMacros itself on your server)

http://www.iopus.com/imacros/firefox/ (free + open source)

http://www.iopus.com/imacros/chrome/ (free + open source)

http://www.iopus.com/download/imacros-ie/ ("only" free)

MikeK
+1  A: 

There have been attempts at visual based scrapers before, but they rapidly become more cumbersome and complex to learn than writing code. With a few abstractions (a function to scrape, a function to select a table by ID and convert it to an array etc) you can make something that is still suitable by beginners.

ScraperWiki