If you're familiar with Reddit, you'll know how all of their posts containing pictures get a small thumbnail preview beside the title of the submission. How does Reddit go about doing that? Does it just check to see if the link ends with .jpg, .png, .bmp
, etc?
views:
56answers:
3reddit will try to pull a thumbnail from any source--not just an image URL. This is done firstly by having set rules for specific sites, and secondly by having one generic process for retrieving thumbnails for unknown URLs--and is an automated periodic task.
One of the (many) benefits of reddit is that the source code is open, and if you understand Python, you should check out /r2/lib/scraper.py
for a more detailed view at how this process works.
Also, while StackOverflow is a great place to have programming-related questions answered, you might also want to check out reddit's own /r/redditdev for information on reddit development.
- Indeed, if the URL contains .jpg, .png, etc., use that.
- If the site is a popular domain (flickr.com, youtube.com, amazon.com, etc.), have a set of predefined rules to extract something you know will be relevant (may it be the featured image, YouTube thumbnail, Amazon product image, etc.)
- Otherwise, if all you have to work with is some HTML, you'll have to dig it out yourself. You could choose the first one on the page, the biggest by size, or even the one you've algorithmically determined to be the most relevent (e.g. relatively big, inside what you think is the main body content.)
If you have to resort to the last option, one technique I'd recommend is to extract multiple images, and A/B test them to find the one which has the best click-through rate. That way you can nearly always get the best one.