views:

56

answers:

3

If you're familiar with Reddit, you'll know how all of their posts containing pictures get a small thumbnail preview beside the title of the submission. How does Reddit go about doing that? Does it just check to see if the link ends with .jpg, .png, .bmp, etc?

A: 

You can check for the content of the <img> tag.

DJ
Do you mean the content of the <img> tag on Reddit? It looks like they compress and reduce the image and save it as a thumbnail sized image on their server.
vette982
+3  A: 

reddit will try to pull a thumbnail from any source--not just an image URL. This is done firstly by having set rules for specific sites, and secondly by having one generic process for retrieving thumbnails for unknown URLs--and is an automated periodic task.

One of the (many) benefits of reddit is that the source code is open, and if you understand Python, you should check out /r2/lib/scraper.py for a more detailed view at how this process works.

Also, while StackOverflow is a great place to have programming-related questions answered, you might also want to check out reddit's own /r/redditdev for information on reddit development.

Hey there redditor!

Bauer
A: 
  1. Indeed, if the URL contains .jpg, .png, etc., use that.
  2. If the site is a popular domain (flickr.com, youtube.com, amazon.com, etc.), have a set of predefined rules to extract something you know will be relevant (may it be the featured image, YouTube thumbnail, Amazon product image, etc.)
  3. Otherwise, if all you have to work with is some HTML, you'll have to dig it out yourself. You could choose the first one on the page, the biggest by size, or even the one you've algorithmically determined to be the most relevent (e.g. relatively big, inside what you think is the main body content.)

If you have to resort to the last option, one technique I'd recommend is to extract multiple images, and A/B test them to find the one which has the best click-through rate. That way you can nearly always get the best one.

Ashley Williams