views:

310

answers:

4

We're using SQL Server 2005 in a project. The users of the system have the ability to search some objects by using 'keywords'. The way we implement this is by creating a full-text catalog for the significant columns in each table that may contain these 'keywords' and then using CONTAINS to search for the keywords the user inputs in the search box in that index.

So, for example, let say you have the Movie object, and you want to let the user search for keywords in the title and body of the article, then we'd index both the Title and Plot column, and then do something like:

SELECT * FROM Movies WHERE CONTAINS(Title, keywords) OR CONTAINS(Plot, keywords)

(It's actually a bit more advanced than that, but nothing terribly complex)

Some users are adding numbers to their search, so for example they want to find 'Terminator 2'. The problem here is that, as far as I know, by default SQL Server won't index short words, thus doing a search like this:

SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"')

is actually equivalent to doing this:

SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator"') <-- notice the missing '2'

and we are getting a plethora of spurious results.

Is there a way to force SQL Server to index small words? Preferably, I'd rather index only numbers like 1, 2, 21, etc. I don't know where to define the indexing criteria, or even if it's possible to be as specific as that.

Any help will be appreciated! Thanks in advance.

+2  A: 

These "small words" are considered "noise words" by the full text index. You can customize the list of noise words. This blog post provides more details. You need to repopulate your full text index when you change the noise words file.

Darren Gosbell
A: 

Well, I did that, removed the "noise-words" from the list, and now the behaviour is a bit different, but still not what you'd expect.

A search won't for "Terminator 2" (I'm just making this up, my employer might not be really happy if I disclose what we are doing... anyway, the terms are a bit different but the principle the same), I don't get anything, but I know there are objects containing the two words.

Maybe I'm doing something wrong? I removed all numbers 1 ... 9 from my noise configuration for ENG, ENU and NEU (neutral), regenerated the indexes, and tried the search.

dguaraglia
A: 

I knew about the noise words file, but I'm not why your "Terminator 2" example is still giving you issues. You might want to try asking this on the MSDN Database Engine forum where people that specialize in this sort of thing hang out.

Darren Gosbell
A: 

You can combine CONTAINS (or CONTAINSTABLE) with simple where conditions:

SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"') and Title like '%Terminator 2%'

While the CONTAINS find all Terminator the where will eliminate 'Terminator 1'.

Of course the engine is smart enough to start with the CONTAINS not the like condition.

xnagyg