ansaurus

Question

How are efficient consecutive word searches implemented?

Answer 1

+1 A:

You probably want to look at Trie. They are very efficient in scenarios like this, but consumes a lot of memory to store the whole structure.

Ravi Gummadi 2010-10-07 20:50:07

Answer 2

A:

I am not sure how the sql database would narrow down it's search but eventually it will come down to a string matching.

When you have a target string and a pattern string, the simplest way to do the comparison is to start at the beginning of the target string and try matching it with the pattern string character-by-character. If the match fails, you advance to the next character in the target string and repeat the above step. This is obviously inefficient because the complexity is of the order O(m*n) where m is the number of characters in the pattern string and n is the number of characters in the target string.

There is an algorithm called Rabin-Karp algorithm that can perform this search in O(m+n) using hashing.

Of course, mysql could have computed hashes that would would help reduce the number of target strings.

pratn 2010-10-07 21:10:19

Answer 3

+1 A:

What you want is sorted inverted index of words from your document. Basically if your text is

"Here is an example sentence. This is how you index things" you turn this into:

Here: 1
is: 2, 7
an: 3
example: 4
......
......

Then when you are searching for a sequence of words, you lookup the list of positions for each word. Now you want to walk the list of sorted positions simultaneously, as if you were trying to merge the lists. while merging the lists it would be easy to spot anywhere where the list of words occur in the exact sequence you want them to.

2010-10-24 16:55:43

ansaurus

tags:

views:

answers:

How are efficient consecutive word searches implemented?

related questions