ansaurus

Question

SQL: Find rows where Column contains all of the given words

Answer 1

+2 A:

The only thing I can think of is to write a CLR function that does the LIKE comparisons. This should be many times faster.

Update: Now that I think about it, it makes sense CLR would not help. Two other ideas:

1 - Try indexing Col1 and do this:

WHERE (Col1 LIKE word1 + '%' or Col1 LIKE '%' + word1 + '%')
  AND (Col1 LIKE word2 + '%' or Col1 LIKE '%' + word2 + '%')

Depending on the most common searches (starts with vs. substring), this may offer an improvement.

2 - Add your own full text indexing table where each word is a row in the table. Then you can index properly.

RedFilter 2010-09-27 14:42:12

Even though I was against it at first, it seems that's the best solution so far...

veljkoz 2010-10-04 07:59:04

After I've tried it, I want to add an update to this - it's just incredibly slow... if the 'like' method finishes in 10 seconds, this CLR function needs ...well I don't know - I just stopped it after 20 mins... so this solution is shelved as well...

veljkoz 2010-10-07 10:49:48

Post your code...

RedFilter 2010-10-07 22:35:04

@veljkoz: see my update

RedFilter 2010-10-08 04:36:48

the 1. doesn't cover the cases where the rows don't start with the search word (but it is faster because it can use index in that case). The 2. is ok, and we we're already thinking about it. Thanks for the updates!

veljkoz 2010-10-08 07:43:13

@veljkoz: That is incorrect, #1 does cover substring matches, see the `OR` clause.

RedFilter 2010-10-08 13:30:23

Oh, yes, you're right - sorry...

veljkoz 2010-10-08 14:51:59

Answer 2

+2 A:

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

MK 2010-09-27 14:45:52

Answer 3

+1 A:

You're going to end up with a full table scan anyway.

The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:

Collation can make a huge difference when SQL Server has to look at almost all characters in the strings. For instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchar data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.

Martin Smith 2010-09-27 14:46:24

This is a good point, but we already have collations set up appropriately...

veljkoz 2010-09-28 07:41:33

Answer 4

+1 A:

WITH Tokens AS(SELECT 'you' AS Token UNION ALL SELECT 'me')
SELECT ...
FROM YourTable AS t
WHERE (SELECT COUNT(*) FROM Tokens WHERE y.Col1 LIKE '%'+Tokens.Token+'%') 
 = 
(SELECT COUNT(*) FROM Tokens) ;

AlexKuznetsov 2010-09-27 14:56:43

An interesting approach, but unfortunately painfully slow...

veljkoz 2010-09-28 07:50:17

ansaurus

tags:

views:

answers:

SQL: Find rows where Column contains all of the given words

related questions