views:

118

answers:

2

I have a pretty big number of photos and a RGB color map (let's say of about 100 colors). How can I group the pictures by color and obtain something like the following: http://labs.ideeinc.com/multicolr ?

My current idea is this: Using ImageMagick, do this for each photo:

  1. Resize it to a smaller size so that it can be processed faster.
  2. Quantize it without dithering using my chosen color map.
  3. Get the photo's histogram to obtain how many times each color appears.
  4. Store the colors in a database, but I haven't figured out what is the best way to do this for fast retrievals.

Do you know any better and more efficient way to do this? My language of choice is PHP since all the heavy processing will be done by ImageMagick, and the database is PostgreSQL. Thank you in advance!

+1  A: 

I notice you already figured out how to get the most relevant colors from the image. Don't resize the images so much because the histogram may look different.

The database may look something like that:

image table:

image_id | image_file

color table:

color_id | color_rgb

image_color table:

image_id | color_id | color_percent

color_percent column will be used for grouping / where clauses

Getting images:

select
    image_id
    sum(color_percent)/count(color_percent) as relevance
from
    image_color
where
    color_id IN (175, 243) # the colors you want to involve in this search
    and color_percent > 10 # this will drop results with lower significance
group by
    image_id
order by
    relevance
narcisradu
I think the color_id is a little bit extra. If its just referencing a color_rgb, I don't think you need a separate key.
rfusca
or it might reference the color name
narcisradu
I'm going to mark this as the accepted answer, because this is what I ended up doing. It isn't perfect, but I like it and it was pretty fun to do. :) Check it out: http://www.picof.net/colors/ .Problems: I don't know how to select photos with more than one color and order them by something significant. I tried ordering them by (color_A_percent + color_B_percent + ...) but then I end up getting photos that have 99% color_A and 1% color_B.
bilygates
I just made an edit. Let me know if my query is useful. Bafta!
narcisradu
@narcisradu Thank you again. Your query worked, but I've figured out a better solution using a 3-dimensional index (PostgreSQL's 'Cube' contrib module). It's much more precise this way, and one could query any RGB color, not just the ones from the palette. It's online, if you want to have a look.
bilygates
+1  A: 

Colours are essentially three dimensional vectors (regardless if they are represented as HSV, RGB, CMY[K]). Unfortunately relational database mostly aren't very good at working in more than 1 dimension.

If you reduce the image down to a single "average" colour then the solution becomes a bit simpler: A trivial analysis would imply that you would need to compare a new image with every existing image to determine the level of similarity. However a better approach would be to digitise the vector the find similar values in the database.

e.g. for 24-bit colour 124, 39, 201 as 1 bit colour: 0,0,1 as 2 bit colour: 1,0,2 ....

If you want to look at more colours in the image, then I'd recommend reducing down to the nearest values of a fixed colour map without error-propagation and identifying the top 'N' most frequently used colours. What you do after that would require some trial and effort - the method above weighted for frequency in the interim image might be necessary or you might just get away with looking at the images where the top N-M colours match N-X of your calculated values (with some tweaking of the M and X values).

C.

symcbean
I've found a PostgreSQL module called "cube" that can deal with multi-dimensional indexes: http://www.postgresql.org/docs/8.4/static/cube.html . I think I'll try it. since it should be possible to select photos based on any RGB color, not just specific ones from a limited palette.
bilygates
Cool - I've learnt something new.
symcbean