views:

202

answers:

5

I have a "colors" table in my database.

The user enters a color trough the user interface, and the backend searches for the most similar looking color existing in the colors table, calculating the distance of the colors in the HCL space.

I will implement a caching algorithm, which should store the distance between previously calculated color distances, to avoid repeated math operations.

What is the best table layout for such purpose ?

A: 

I'm not too familiar with HCL, but based on the description at http://search.cpan.org/~mbarbon/Color-Similarity-HCL-0.04/lib/Color/Similarity/HCL.pm it seems that two colors are needed as an input for distance.

So I would think at least two sets of RGB and the according distance between them should be stored. I'm not sure of your use case, but if the range of choices is selected, you may want to store the user-selection as well.

It seems that there would only be a finite number of combinations though? It seems like you could do the math once for each combination, and just have a lookup table?

Yancy
You have misunderstood my question.Should I place the two color_ids and the distance between them in one row ? or place the distances and the colors in a seperate table ?
astropanic
+3  A: 

you could do this:

table colors(r,g,b)
table colordistance(user_r,user_g,user_b,r,g,b,distance)

but do you expect your users to keep entering the same numbers??? The maximum numbers of rows in this table is 16777216 if you only include the closest color.

I still suspect that the database access is slower than the calculation, so I'm thinking of the quote "premature optimization is the root of all evil".

I would run it without any caching of the calculation until I see it as an actual problem.

Osama ALASSIRY
+1  A: 

I assume that your color "distances" are calculated as something like:

sqrt((r1-r2)^2 + (g1-g2)^2 + (b1-b2)^2)

Assuming that you are using 8 bit pixels, there would be (256^3)^2 distinct mappings in your table. That's a LOT of table space. (You could probably compact it a lot, but ... see the next point.)

The other thing you need to consider is the cost of a database lookup to find a color distance versus the cost of doing the calculation. My guess would be that a database lookup would take a millisecond or more, but the metric calculation should take 1 microsecond or less.

All in all, using a database table sounds like a really bad idea to me.

Stephen C
A: 

Here's what I recommend:

table colors(color_id, color_name, r, g, b)

table color_distances(color_1_id, color_2_id, distance)

Indexes: PRIMARY (color_1_id, color_2_id) INDEX (color_1_id, distance, color_2_id)

color_distances would contain all possible color_id combinations, and would only be updated as needed.

Selecting would then be simple:

SELECT similar_colors.*
FROM colors as similar_colors, color_distances
WHERE color_distances.color_1_id = <selected_color_id>
ORDER BY color_distances.distance ASC
Dan
+3  A: 

As Osama said, this looks like premature optimization. Based on your description of the algorithm, I would:

  • Pre-calculate the HCL vectors for all of the colors in the database, and store a table which maps a color id to its HCL vector.
  • The table should be stored using the MySQL Spatial Extensions, which allow you to query for neighbors of a point.
  • When a new color is chosen, transform it to HCL, and query for neighbors of its point in HCL space.
  • If caching is needed at all, I would cache coarse-grained colors, so there is some chance that users revisit a previously selected color.
Yuval F