views:

305

answers:

4

First off I want to say that I wasn't really sure where to post this but it is very much programming related. If it is in the wrong spot I apologize and please let me know where I should post it instead.

When sharing an article on a friends wall, facebook will grab a thumbnail of the article. How do they always get the right thumbnail from articles?

It doesn't grab the logo img element of of http://www.nytimes.com/2010/06/07/world/asia/07convoys.html?hp for example but rather grabs the correct image element that corresponds with the article.

I'm looking to do something similar and was wondering of a good way to parse the html to find the image given this example. Thanks.

+2  A: 

They don't always grab the correct image, even though there's certainly some good logic in place.

In many cases, I've seen a list of thumbnails to choose from, meaning Facebook's parser considered them equally relevant.

I would guess they (probably among other things) look at the dom structure and find images close to content that looks "shareable".

UPDATE:

After some empirical testing, it seems that image dimensions play a big role. Images too small and too wide are not considered thumbnails. If your logo is the right size though, expect it to show up as one of the thumbnails. Try sharing something on http://www.e24.se for example.

Lauri Lehtinen
+13  A: 

Actually, Facebook's way of finding thumbnails isn't so magical. It searches for a set of <meta> and <link> tags which specify which title, description, and image to use.

If it cannot find any of the <meta> and <link> tags it is looking for, it basically asks the user to choose whichever <img> tag fits.

In the case of the NY Times, it uses the following:

<meta name="thumbnail" content="whatever.jpg" />

Facebook recommends you use a <link> tag instead for the thumbnail.

<meta name="title" content="title" />
<meta name="description" content="description " />
<link rel="image_src" href="thumbnail_image" />

Source: Facebok Share/Specifying Meta Tags

Andrew Moore
Right, but for example I found a tomshardware article:http://www.tomshardware.com/picturestory/538-computex-2010-booth-babes.htmlthat it pics out the right thumbnail initially, without the meta tag or other indicator on the page.
Travis
@Travis: Which `<img>` tag is closer to the biggest chunk of text on the page... Or which `<img>` tag is occupying the most space. It guesses right on some pages, but some others it doesn't
Andrew Moore
Alright, makes pretty good sense now, thanks for the help Andrew.
Travis
A: 

These are just guesses as I don't have any knowledge of Facebook's internal operations, but if I were parsing thumbnails from a page I would consider several things:

  • Size of the image, as previously stated
  • Relevant keywords in the href or alt attributes
  • Location of the <img> tag on page, the closer to relevant content the better, but may not always work for complicated layouts
  • Absence of ad-related keywords in the <img> tag or nearby tags (doubleclick comes to mind)

Also, as far as I know the Facebook meta tags are fairly new, so my guess is that the link page scraper is still grabbing images the hard way ;) However if you're running a site and want Facebook to grab the right information when it scrapes your pages I highly suggest implementing them.

johnny_bgoode
A: 

Meneame.net, a spanish digg-like site, performs a similar operation with the submited news. Meneame is free software released under the Affero GPL license, so you can take a look to it's implementation: http://websvn.meneame.net/filedetails.php?repname=meneame&amp;path=%2Fbranches%2Fversion3%2Fwww%2Flibs%2Fwebimages.php

It can grab thumbnails from HTML 5 videos, images and YouTube/Metacafe videos embedded in a webpage.

If you use it, remember to endorse to the Affero GPL.

Matachana