views:

93

answers:

3

I've got a simple select query which executes in under 1 second normally, but when I add in a contains(column, 'text') into the where clause, suddenly it's running for 20 seconds up to a minute. The table it's selecting from has around 208k rows.

Any ideas what would cause this query to run so slow with just the addition of the contains clause?

+2  A: 

Substring matching is a computationally expensive operation. Is the field indexed? If this is a major feature implementation, consider a search-caching table so you can simply lookup where the words exist.

Emyr
+1  A: 

CONTAINS does a lot of extra work. There's a few things to note here:

  • NVarChar is always faster, so do CONTAINS(column, N'text')
  • If all you want to do is see if the word is in there, compare the performance to column LIKE '%' + text + '%'.
  • Compare query plans before and after, did it go to a table scan? If so, post more so we can figure out why.
  • In ultimo, you can break up the text's individual words into a separate table so they can be indexed.
Stu
+1  A: 

Depending on the search keyword and the median length of characters in the column it is logical that it would take a long time.

Consider searching for 'cookie' in a column with median length 100 characters in a dataset of 200k rows.

Best case scenario with early outs, you would do 100 * 200k = 20m comparisons Worst case scenario near missing on every compare, you would do (5 * 100) * 200k = 100m comparisons

Generally I would:

  • reorder your query to filter out as much as possible in advance prior to string matching
  • limit number of the results if you don't need all of them at once (TOP x)
  • reduce the number characters in your search term
  • reduce the number of search terms by filtering out terms that are likely to match a lot, or not at all (if applicable)
  • cache query results if possible (however cache invalidation can get pretty tricky if you want to do it right)
Yannick M.