views:

21

answers:

1

I'm working on a project that involves scanning in colors to RGB then searching through a database of more RGB data to see what is most similar to the scanned in color. I've decided that the easiest way to determine what "similar" means in this case is to represent the colors in three dimensional space and then find the distance between the scanned point and the rest of the database.

It seems to me that the first part of this is fine, the idea of using 3d space and proximity, but that second half seems like a bad idea to me, I shouldn't have to check the scanned color against every single point in the database, should I? Having never done any formal CS work, I don't know what to do, but I have a terribly distinct feeling that there must be a better way.

Or, to make this abstract: I have some input data, and a bunch of stored data and a function that tells me how similar the any two data are. What is the most efficient way of finding out the most similar stored entity to the input?

Edit: I'm using python for this, if anyone's curious.

+2  A: 

To address the abstract statement: Unless there is structure in the similarity function that is known a priori, there is no better method than "try everything".

Your problem is studied under the term "nearest neighbor search". For this problem a cover tree is vaguely suitable. This page has pointers to code. A kd-tree may also be suitable.

In the absence of further information about the structure or coverage of the database of reference colors, it's hard to make further suggestions. For instance, if the database is known to have a worst case sparsity, then the query can be constrained to those entries whose R component differs from the test color's R component by less than the sparsity and so on for the G and B components, reducing the entire database to a smaller cubical section guaranteed to contain the result. (Using this method requires a proof that no point in the cube is further than X from some point in the database, which at least can be predetermined only once for a fixed database.)

Eric Towers
I was thinking about using something like a kd-tree, but I had no idea it existed or had a name. Thanks for the helpful pointers! I'll try to hack some things together.
NSU