views:

209

answers:

2

I have a table where I'm inserting, for instance, images and the names of the colors found in said images. The color string looks something like "white, yellow, orange, black".

Since I have a lot of these, the 50% threshold is starting to drop off some colors, since they appear on most of the rows.

The whole point of the table is to be able to search by color. Does anyone have a workaround, or should I just give up and go for Sphinx or something similar? Recompiling MySQL is probably not an option.

I've heard that some people just add dummy rows to the table to bypass the 50%, but that sounds pretty desperate.

+4  A: 

This is NOT a problem to be solved with full text search.

Instead, you need a child table with one row per color per image.

The color should be a foreign key to a color table, so your child table actually becomes a many-to-many relation between image and color.

create table color ( 
 id int not null primary key auto_increment,
 name varchar64)
); 

create table image_color (
  image_id int references image(id), 
  color_id int reference color(id),
  unique constraint (image_id, color_id) 
) ;

Then index the relation (and give it a unique constraint on the tuple (image_id, color_id).

Then to find all images with a particular color:

select a.* from image a 
join image_color b on (a.id = b.image_id) 
join color c on (b.color_id = c.id)
where c.name = 'yellow';
tpdi
+1 for writing what I wanted to write, only faster
David Schmitt
+1  A: 

The whole point of the table is to be able to search by color. Does anyone have a workaround

Yes, use BOOLEAN MODE searches, which are not affected by the 50% threshold and are less unpredictable.

But, +1 to tpdi's answer, I can't imagine why you're using fulltext search instead of a simple join table listing the colours. It's a simple yes/no logical storage problem, there's no need to drag in the fulltext complexities of word-splitting, word ordering, stopwords, too-short words (“red” will not be indexed by default) and most of all the fact that you have to use nasty old MyISAM tables to get the feature at all!

Fulltext search is hard in general, and not particularly well-implemented in MySQL. It should be your last resort for when you really do need to search for words in large stretches of text, and not as a feature of choice.

bobince
Boolean mode returns results that are basically worthless.
kari.patila
Why are they “worthless”? What ‘wrong’ results are you seeing? It works fine for me.
bobince