views:

191

answers:

3

Hi folks,

I'm indexing a list of links, these links update quite often so I'm automating thumbnails for the sites.

For most sites it's easy, as I just grab the biggest image on the page hoping it describes the content.

But other times there are videos as main content of the page.


Does somebody have tips with dealing with this? That would be great!


Regarding the usage of Webkit to create screenshots I found this

+3  A: 

wkhtmltopdf uses an embedded copy of the WebKit render engine (used in Safari, Chrome etc.) to save a webpage to PDF, including all images (no Flash video though I guess). That could be a starting point for a much more accurate thumbnail.

Wim
+1 nice starting point.
N 1.1
This is great idea. I'll see what I can find. It would be **Great** if I can figure how to use this out.
RadiantHex
+3  A: 

There exists (free and paid) services that do exactly what you need. I use shrink the web:

puzz
+1  A: 

If the sites you are scraping from support oEmbed, you can let them do the heavy lifting for you. Search this PDF for "oEmbed" for a good start. http://bit.ly/adx4bi

jcdyer
Thank you! This is indeed useful!
RadiantHex
Indeed! I just learned about oEmbed at pycon, and am looking forward to playing around with it myself.
jcdyer