tags:

views:

45

answers:

4

If I have a table on my DB called product_tags with 2 fields: tag_id and tag_name

Here is the schema:

CREATE TABLE `product_tags` (
 `tag_id` int(11) NOT NULL auto_increment,
 `tag_name` varchar(255) NOT NULL,
 PRIMARY KEY  (`tag_id`),
 UNIQUE KEY `tag_name` (`tag_name`)
) ENGINE=MyISAM AUTO_INCREMENT=84 DEFAULT CHARSET=utf8

Say here some tags in it:

  • yellow gold
  • yellow diamond
  • white gold
  • rose gold
  • band
  • diamond
  • blue diamond
  • pink diamond
  • black diamond

And I want to do a search on the string "yellow gold diamond band"

I only want to pull the following tags:

  • yellow gold
  • band
  • diamond

Because only those tags are exactly in the string. yellow and diamond are both in the string but not together so the yellow diamond tag should be ignored.


-Additionally if possible

If I did the search for "yellow gold blue diamond band"

I only want to pull the following tags:

  • yellow gold
  • band
  • blue diamond

the diamond tag would be ignored because the blue diamond tag would be the match.


How can I do this?

+3  A: 

edit:

select
   *
from 
   product_tags P
where
   INSTR('yellow gold diamond band', P.tag_name) > 0
vulkanino
+1 Wow this is great!!
John Isaacks
that answered the first question, but running the query with the second input i get also the "diamond" tag.
vulkanino
How about a table with "inheritance" or something? i.e. a table with two columns, both foreign keys into your tag table, specifying that one supersedes the other. For example, "blue diamond" supersedes "diamond". So if you get the result and you see that you have "blue diamond", you remove "diamond" from your results.
EboMike
I noticed that too, still very good. Brian had a suggestion on how to remove the shorter duplicated word ("diamond") from the list.
John Isaacks
Actually Brian's and @EboMike's suggestions are both good, but in the case of a search string being "blue diamond ring with diamond accents" I would want both tags. Hmmm, not really sure what do about that... maybe better to just leave them in. Perhaps, just give blue diamond more "weight/search relevance" than diamond since it is a narrower category.
John Isaacks
+1  A: 

Intuitively you could build an algorithm that iterates over all of the possible word combinations formed by contiguous words within the search phrase, and then find which of those is in your tag table. For instance:

yellow gold blue diamond band

Your possible combinations of contiguous would be:

  • yellow
  • gold
  • blue
  • diamond
  • band
  • yellow gold
  • gold blue
  • blue diamond
  • diamond band
  • yellow gold blue
  • gold blue diamond
  • blue diamond band
  • yellow gold blue diamond
  • gold blue diamond band
  • yellow gold blue diamond band

From this entire list, the only terms that match your original list are:

  • diamond
  • yellow gold
  • blue diamond
  • band

from this list you could cull any items that repeat the same word, favoring the longer option over the shorter with the assumption that the longer option is more descriptive. Thus, after removing those terms you have:

  • yellow gold
  • blue diamond
  • band

This looks like the list you want. Now, this approach works but it will become painfully sluggish as the number of terms in a search phrase increases. For instance, just your 5 terms generated 15 potential tag searches. Imagine if you put in 10 words...

Therefore, my honest recommendation is that you use some sort of punctuation to separate tags within a search, thus making it easier to find tags by simply splitting the searh phrase by the punctuation and searching on those terms, like thus:

yellow gold, blue diamond, band

With a comma-delimited list, you now only have 3 search terms rather than 15, making it much easier to search your table of tags.

Brian Driscoll
A: 

You could probably do something like:

WHERE @searchTerm LIKE CONCAT('%', tag_name, '%')

Not very efficient for lots of tags, but it would work in the simple cases given.

EvilRyry
A: 

I cant think of any good way to do this in SQL directly.

However if i were to implement it in my application logic, this is what the pseudo logic would probably be like

1. Split the search string "yellow gold diamond band" using " " character. string[] search
2. Take the 1st value from the array i.e. yellow in this case.
3. Do a SELECT * FROM product_tags WHERE tag_name LIKE 'yellow%'
4. This will return "yellow gold" and "yellow diamond"
5. Loop through each of the results in 4
   a. Split each of these results using " " string [] result
   b. If the split array contains has count = 1, we found an exact match for "yellow". No need to search further
   c. If the length of the array > 1, Match the search[1] with result[1] till either you have exhausted the split array and find a match or dont find one
   d. If more than one match has been found, the longest match is considered
6. Go back to step 2 and repeat for the next string i.e search[1]
InSane