views:

50

answers:

2

is there any way I can scrape web pages that uses AJAX?

by using something like ruby + mechanize on linux server that doesn't have monitor attached (linode.com for example)

http://watir.com/ would be a solution but I guess not applicable to linode.

+1  A: 

iMacros for Firefox/Chrome (free/open source) works with many AJAX sites and works on Linux, too. Use the command line to control its scraping. The Chrome version is still a bit buggy, but the Firefox version works great.

MikeK
can I use iMacro without firefox? I guess I won't be able install firefox on linux that has no monitor. I am not 100% sure though :-)
Radek
No. What separates iMacros from mechanize is exactly that it runs inside a web browser. Thus websites are rendered by the browser before they are automated.
MikeK
We use it on Windows AWS EC2 instances (= without monitor), so that is no problem: http://wiki.imacros.net/How_to_Schedule_a_RemoteInteractive_Session#Alternative:_Use_Windows_AutologonI am sure the approach can be done on Linux. We use Windows as we need to run IE, too.
MikeK
chm, interesting. Just wondering how I can set it up without having firefox running on the box
Radek
+1  A: 

Check out TestPlan. It can do testing without a monitor -- by using the HTMLUnit backend. It handles quite a lot of JavaScript, including AJAX. I use it to scrape several pages and have built several tests of AJAX with it.

You can also run TestPlan with a browser if you want. This gives you the best of both worlds: develop tests and visually see what is happening, and then switch to the display-less mode.

edA-qa mort-ora-y