tags:

views:

97

answers:

1

ok, im building a search engine. and the search module able to extract the relevant words. and now i have a list of words and their offset in original source text.is it a bad idea to use levenshtein distance to compute difference between the query string and the portion of source text ( begin at given word's offset and up to query string's length). ask i was thinking this would help me generate excerpt faster.

it doesnt need a proximity search etc., only normal 'ANY' and 'ALL' modes. btw, the results already sorted so im only looking into a excerpt generation now. thanks.

+1  A: 

Build a one to many mapping from the text to the contained words (and their count). This "bag of words" vector can then be used for a lot of different techniques.

bayer
that was my plan, but then i tought maybe this technique could do alot faster with less iterations, thats why i asked here.
kar
I don't understand what iterations you mean. You need a single pass to build that vector.
bayer