I'm developing a feature on a forum site that will allow to include a link and other type of content on a post (for clarifying the question or answer).
Related to the link feature implementation, I have several things to work on:
- Validate the URI entered (well formed, valid scheme, etc.)
- Validate that the remote resource exists
- Extract images from within the remote page
- Show to the user the set of images and let him choose one
Here comes the challenge. Previous to step 4, it would be great to sort this set of images in order of 'relevance'. I know that it's a goal quite ambiguous :-) but I can explain what I've gone through with the results given in step 4 and you will know why I'm dealing with this solution.
Many times, I get this kind of things into the set of images:
- Images used for the layout of the page (tiny and useless)
- Banners and ads
- Pseudo-duplication of images (original and resized one)
- Anarchical order of the set (logo on last position, etc.)
I decide to clean up this mess removing tiny images and sorting them by size, but I know that will be far away from a good solution.
Any ideas on that???
Thank you very much!