views:

118

answers:

1

One of my clients is hooked on multi-column substring matching.

I understand that Contains and FreeText search for words (and at least in the case of Contains, word prefixes). However, based upon my understanding of this MSDN book, neither of these nor their variants are capable of searching substrings.

I have used LIKE rather extensively (Select * from A where A.B Like '%substr%')

Sample table A:

ID | Col1     | Col2     | Col3     |
-------------------------------------
1  | oklahoma | colorado | Utah     |
2  | arkansas | colorado | oklahoma |
3  | florida  | michigan | florida  |
-------------------------------------

The following code will give us row 1 and row 2:

 select * from A where Col1 like '%klah%' or Col2 like '%klah%' or Col3 like '%klah%'

This is rather ugly, probably slow, and I just don't like it very much. Probably because the implementations that I'm dealing with have 10+ columns that need searched.

The following may be a slight improvement as code readability goes, but as far as performance, we're still in the same ball park.

 select * from A where (Col1 + ' ' + Col2 + ' ' + Col3) like '%klah%'

I have thought about simply adding insert, update, and delete triggers that simply add the concatenated version of the above columns into a separate table that shadows this table.

Sample Shadow_Table:

ID | searchtext                 |
---------------------------------
1  | oklahoma colorado Utah     |
2  | arkansas colorado oklahoma |
3  | florida michigan florida   |
---------------------------------

This would allow us to perform the following query to search for '%klah%'

select * from Shadow_Table where searchtext like '%klah%'

I really don't like having to remember that this shadow table exists and that I'm supposed to use it when I am performing multi-column substring matching, but it probably yields pretty quick reads at the expense of write and storage space.

My gut feeling tells me there there is an existing solution built into SQL Server 2008. However, I don't seem to be able to find anything other than research papers on the subject.

Any help would be appreciated.

+1  A: 

From your description it sounds like you are looking for a way to improve exact searching. LIKE is the proper tool to use when you are trying to find character strings that EXACTLY match your string. If you are worried about performance, than you should consider indexing or even a custom index such as you've described.

Maybe consider a persisted computed column instead of a shadow table. The overhead on inserts/updates should be less than using triggers, and the query time will probably be equivalent.

On Full Text Search

Full text searching is designed as a natural language search.

Consider from the end user perspective. If I were searching for "Oklahoma", I would probably start with either "okla" or "ok" or "oklahoma". I would not search for "homa". This is the way our human minds think. Hence, "natural" language searching.

Natural language searching uses root stems and similar words to increase the total number of results. However, it is not optimal if you want all the results to specifically match your search term: e.g. Free text will match "I drove to my lesson" with "driving and lessons" even though neither word specifically appears.

Joel Potter
I think using a persisted computed column here is much more elegant than using a shadow table. In fact, I can name it something consistent across all tables that need this specific type of search. Also, thanks for the explanation on Full Text Search. For almost any search that faces the web, I can see that being very useful. However, in cases like mine, we will be searching exact records such as names and street names, so I'd much rather facilitate exact searching with substrings. If it's too slow, I can fall back on full text search. Thanks!
hamlin11