views:

35

answers:

2

I'm writing some code to automatically tag some articles.

I have an array of tags and a table of articles. I run the following query to check for headlines matching a tag:

SELECT headline 
  FROM `news` 
 WHERE MATCH(headline) AGAINST ("+Green +Day" IN BOOLEAN MODE)

This finds all articles with the exact phrase 'Green Day' in the headline - without the first +, I get articles that contain just the word 'Green'.

This isn't perfect and some tags result in inaccurate results - eg a tag called Die! Die! Die! (don't ask) returns every headline with the word 'die' in it.

Is there something obvious I'm missing here? All I want is to get headlines which contain the entire phrase, in the exact way it's entered.

A: 

If you want to match the entire phrase then you should do something like:

SELECT headline FROM news WHERE headline LIKE '%Green Day%'

that will return you results with the phrase "Green Day" in the headline.

webdestroya
Nope - tried this first. Works fine for "Green Day", but if I have a tag like "Error", it also matches headlines with the word "Terror".
Matt Andrews
@Matt - then you should try using a regex search and setting word boundaries
webdestroya
@webdestroya .... or use a fulltext index :)
Pekka
@Pekka - Hah, my bad. I can't believe I didn't think to wrap it in quotes.
webdestroya
+3  A: 

As far as I can see in the docs, using quotes should be enough. From the examples on the docs page:

"some words"

Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the “"” characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.

Pekka
You're correct - can't believe I missed this. Thank you!
Matt Andrews