views:

675

answers:

4

Hello!

Based on suggestions here @ SO, I have cataloged the average color for a set of stock images.
r,g,b = image.convert("RGB").resize((1,1), Image.ANTIALIAS).getpixel((0,0))

Now, I would like to present a color wheel to the user and run a search against my catalog to find images that are the closest match to the color selected.

I have read through several questions posted here that recommend "finding the distance between two colors", and reference the Flickr Hacks book.

The Flickr Hack distance algorithm seems to be basically:


diffr = checkImage.r - search_r
diffg = checkImage.g - search_g
diffb = checkImage.b - search_b
distance = (diffr * diffr + diffg * diffg + diffb * diffb)
if distance < threshold then matched.


This method would require me to calculate the distance between my search color and every image's color fingerprint. I was wondering if there is a way to somehow specify a "search area" based on the selected color (center point) and a pre-determined threshold (or search radius). Then construct a SQL like query to return all images that fall within this area.

Is this possible??

BTW, I'm implementing this in Python using PIL and related libraries.

Thanks for your help SO!

SR

+1  A: 

If it were me, I'd go a little less fancy and cache the searches in a secondary table, like:

CREATE TABLE `image_search` (
    `id` int not null auto_increment,
    `image_id` int not null,
    `r` tinyint not null,
    `g` tinyint not null,
    `b` tinyint not null,
    `distance` tinyint not null,
    `hit` bool not null,
    PRIMARY KEY (`id`),
    UNIQUE KEY `image_id_by_rgb_by_distance` (`image_id`,`r`,`g`,`b`,`distance`),
    KEY `image_id_by_rgb_by_distance_by_hit` (`image_id`,`r`,`g`,`b`,`distance`,`hit`),
);

Pull from that to find your matches, like

SELECT `image_id`
FROM `image_search`
WHERE `r` = $r
AND `g` = $g
AND `b` = $b
AND `distance` = $distance
AND `hit` = 1

If you get no results, then do

SELECT `image_id`
FROM `image_search`
WHERE `r` = $r
AND `g` = $g
AND `b` = $b
AND `distance` = $distance

and if there are no results to that, then run through your image catalog doing the comparison and store each result, positive or negative, in the table.

Then it'll only be slow when it doesn't have the results cached. If your UI encourages the user to pick certain useful preset colors, you can precompute for those and help yet more.

Also bonus points for precomputing all previously executed searches when you add an image to your catalog.

chaos
chaos, if I understand you correctly, you are suggesting I do the comparison analysis upfront and store the relationships in the DB. So for each color, I would already have a list of images that are similar, and the operation would be a simple search. That's pretty simple,and actually very clever, I'm going to give it a try.However, this would only be usable if the search start point was an image already in my catalog. I would compute its avg color, then try my search. I want to present a color wheel and present the best matches to a selected color. Any thoughts?
Shaheeb Roshan
That's why I'm speaking of it primarily as a caching mechanism. If the color/distance combination isn't in the table, then you compute and store it. Precomputation is only used to seed the cache with values you know users are likely to select.
chaos
A: 

You have to define the problem better. Do you want pictures with a similar average color, or pictures that have a center that has a similar average color? What do you want to achieve in the end? (similar average color isn't an amazing similarity criterion...)

Assaf Lavie
I want images whose average color is similar to the color selected using an HSV color wheel.The images are of clothing (specifically of models wearing clothing against neutral backgrounds). I want to be able to search this catalog of images using colors selected visually on a color-wheel. Believe me, I know I could tag the images and catagorize a hundred different ways. I am hoping for a novel browsing mechanism (very similar to Etsy) that is based on color.Does that help?
Shaheeb Roshan
Your average will contain the background and will therefore probably not yield the results the user expects. I would suggest that you use a segmentation alogrithm to separate foreground and background and use only the average color of the foreground. Should be easy for neutral backgrounds.
Smasher
+1  A: 

You can save significantly on computation by doing comparisons on each component instead of squaring to find the distance.

if abs(check.r - search.r) < threshold and
   abs(check.g - search.g) < threshold and
   abs(check.b - search.b) < threshold

Combining this with cache tables will probably be enough for whatever you're doing.

Kai
This is a box search, the sum of squares in the OP is a spherical test. I doubt there's any difference between the two methods speed wise (three 8 bit multiplies verses three adds and abs), you're biggest overhead, as with most things these days, is memory bandwidth.
Skizz
Easily the most simple approach, and I am desperately hoping it will yield usable results. Thank you! I will try it shortly and post my results.I heard once on a TED video that the hobbyist had achieved 80% of the functionality of commercial products at about 2% of the cost. I love that mentality, and its what I'm aiming for with my project: yes the visual color search won't be perfect, but if its mostly there with a few hours of effort, that will be plenty cool already!SR
Shaheeb Roshan
A: 

We can look at a color as a point in three-dimensional space. Now each image will be at a point in space defined by its average color. The user is choosing a point in 3-d space, and you want to find the image nearest that point.

This is not simple, but a lot of work has been done on it by people smarter than you or I (Don Knuth calls it the 'post office problem'). A good place to start is at Wikipedia, as usual.

David