views:

528

answers:

6

What is the best solution to programmatically take a snapshot of a webpage?

The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.

It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).

Any ideas how to do this in Ruby? Seems pretty tough.

Browsers to do this in: Safari or Firefox, preferably Safari.

Thanks so much.

A: 

as viewed by.... ie? firefox? opera? one of the myriad webkit engines?

if only it were possible to automate http://browsershots.org :)

Oren Mazor
preferably safari, though firefox is fine. this won't be changing :)
viatropos
+4  A: 

This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.

If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.

This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.

Another option, which isn't dependent on the OS, might be to use the BrowserShots API.

Olly
Objective-C + Webkit, cool!
OscarRyz
+1  A: 

Use selenium-rc, it comes with snapshot capabilities.

flybywire
any examples to get started? I've also tried PageGlimpse, which is pretty easy, but there's no examples.
viatropos
+3  A: 

There is no built in library in Ruby for rendering a web page.

The Who
A: 

With jruby you can use SWT's browser library.

jrhicks
A: 

Just to add one more site to generate thumbnails, sitethumbshot.com

Hannah