views:

3889

answers:

4

I have a problem with a query in ms sql server. I have a full text index on a column called "col1". The data in this column can get quite large (20, 30 kb +). I now want to search in this column for an exact phrase.

I have been told that the "contains" function is the fastest function for this, but I am aware of at least 2 other ways of doing this; using the "like" function, and using "charindex".

The problem is that "contains" doesn't work when I am searching for a phrase which contains a # symbol. For example, "... WHERE contains(col1, '"query string#"') ..." will always return 0 results.

I have switched to using charindex, and that does return results, but it takes a lot longer to query the database using this function.

Is there any way to either speed this query up or get the contains function to accept my # symbol?

Thanks for your time...

Update I've decided to switch between using charindex the contains function. So if the query data contains the # symbol, we switch to using charindex; for all other queries, I use the contains. Seems to work the best.

A: 

have you tried putting the search string in a variable and then searching on the variable?

DForck42
+2  A: 

We have a similar problem with out own implementation of FTS. This is because Microsoft strips out a lot of special characters and common words from their indexing routine.

In our situation, we have control over the input and pass all the text through a function that translates special characters such as your hash symbol. So the input into the database for the hash symbol may look like this "zxzHASHyxy".

We can then substitute our translated version for the "real" version when performing searches.

There is, however, quite a big down-side to this implementation. If you need to keep a copy of the untranslated text, you'll have to do so in a separate column and that's going to lead to a lot of bloat in your database.

Use this solution sparingly.

Sonny Boy
That's quite interesting, never thought of that, but that won't work in my case; the database is used for several different applications, and I can't go around modifying the values in the database.
+1  A: 

Special characters like "#" are word breakers and do not get included in the index. To full-text indexing 'query string#xyz' looks like 'query string xyz'.

You could try to use the FREETEXT function:

Full-text queries using FREETEXT are less precise than those full-text queries using CONTAINS. The SQL Server full-text search engine identifies important words and phrases. No special meaning is given to any of the reserved keywords or wildcard characters that typically have meaning when specified in the parameter of the CONTAINS predicate.

splattne
Maybe I'm just not enough of an sql guru, but it seems to me that the contains feature as implemented in ms sql server is quite flawed. If I want to search for an exact phrase, there should be a (fast) option for that, never mind what MS thinks should be word breaks or whatever. I can see that FREETEXT may be useful for fuzzy matching, but contains is still too fuzzy for me; the way they've built these 2 functions is quite useless for my type of query.
A: 

Have you performed any testing using the LIKE operator/ predicate instead of the CHARINDEX() function? It would be my expectation that LIKE would be faster than CHARINDEX(), but I don't have any evidence or documentation to back that up.

Additionally:

  • is the # symbol itself actually important in the query?
  • if it is, could you use a two-stage affair where you use a SQL's CONTAINS() to retrieve a list of all records that contain the query string (with or without the #) and then an application-side test to remove entries that do not have the # ?
Jason Musgrove
I just tried; LIKE is horrible (coming in at approx 15,297 ms); next is CHARINDEX (2828 ms), and finally the fastest by far, is contains (at 97 ms).Your suggestion of application-side manipulation is interesting, but won't work; it would be far to complicated when trying to integrate that with the TOP function, for example.