ansaurus

Question

What is a suitable replacement for the SQL Like operator to increase performance?

Answer 1

+4 A:

How about pre-processing (once) the items table (and each new entry added to it would need to be processed), to create a word occurrance table having

CREATE TABLE WordItemOccurance
(
    [Word] varchar(50) not null,

    ItemId int not null
        constraint FK_Items references ItemTable(ID)
)

Iterate over all your items, break into separate words and add entries to the occurance table as they are found.

Creating a clustered index on [Word] and joining to the Item table on ItemId should be fast.

Mitch Wheat 2008-12-31 01:27:07

Maybe even create a trigger on the Items table to pre-process newly added entries (if compact supports this...)

Mitch Wheat 2008-12-31 01:33:14

Not a bad idea. This will likely be the approach I go with, though the bloom filter idea looks interesting too.

Jason Down 2008-12-31 04:07:03

Answer 2

+3 A:

I voted for Mitch Wheat's answer, but there are a few tricks I would also test for effectiveness.

My big worry about having a table full of [char],[int] is you may still find yourself executing large volumes of pointless string comparisons, especially if you use %word% on this new table. ( Duplicated but not-matching-our-search entries ).

I would probably opt for experimeting with

Words
-----
chars  | word_id 

WordsToEntry
------------
word_id | entry_id

and see if the database overhead is a worthy mitigation of this possible problem ( I cant test, sorry )

Kent Fredric 2008-12-31 01:44:39

You won't need to do a '%word%' match on the table, simply a 'word' match, which is the reason for using it.

Mitch Wheat 2008-12-31 02:33:25

the problem is if you split merely by white space, you'll grab all the delimiting noise tokens too, and also, without %word% you won't be able to find words that are part of compositions, ie: find "dog" wont match "dogs"

Kent Fredric 2008-12-31 03:06:23

Good point. IN this case it's important that singular words return all plurals and words that are contained within larger words.

Jason Down 2008-12-31 04:08:50

Answer 3

+1 A:

You could try using a bloom filter.

Jauder Ho 2008-12-31 02:23:12

Interesting read. Worth taking a crack at even if just out of the sake of interest. Thanks Jauder.

Jason Down 2008-12-31 04:06:15

I couldn't figure out how to find words contained with other words using this approach, so I think it's not correct for my particular circumstances (e.g. "club" must also return "clubs".

Jason Down 2009-01-04 17:43:58

Bloom filters only tell you if something exists.What you are asking for is stemming capability. See http://www.google.com/search?q=stemmingThere's a bunch of stemming algos listed.

Jauder Ho 2009-01-17 06:13:25

ansaurus

tags:

views:

answers:

What is a suitable replacement for the SQL Like operator to increase performance?

related questions