using levenshtein distance to generate excerpt | ansaurus

tags:

views:

97

answers:

1

Q:

using levenshtein distance to generate excerpt

ok, im building a search engine. and the search module able to extract the relevant words. and now i have a list of words and their offset in original source text.is it a bad idea to use levenshtein distance to compute difference between the query string and the portion of source text ( begin at given word's offset and up to query string's length). ask i was thinking this would help me generate excerpt faster.

it doesnt need a proximity search etc., only normal 'ANY' and 'ALL' modes. btw, the results already sorted so im only looking into a excerpt generation now. thanks.

+1 A:

Build a one to many mapping from the text to the contained words (and their count). This "bag of words" vector can then be used for a lot of different techniques.

bayer 2009-07-17 11:53:01

that was my plan, but then i tought maybe this technique could do alot faster with less iterations, thats why i asked here.

kar 2009-07-17 12:03:56

I don't understand what iterations you mean. You need a single pass to build that vector.

bayer 2009-07-17 16:16:16

related questions

any good tool for makefile generation?

How do you pass a function as a parameter in C?

Where should a veteran C programmer start in order to master Java ?

Any good book on best practice and guidelines in developing a SDK in C?

How do you determine the size of a file in C?

Decoding printf statements in C (Printf Primer)

Shift operator in C

How to avoid redefining VERSION, PACKAGE, etc.

Alpha blending sprites in Nintendo DS Homebrew

When should I use type abstraction in embedded systems

How to implement continuations?

What are the barriers to understanding pointers and what can be done to overcome them?

Anyone have experience creating a shared library in MATLAB?

String.indexOf function in C

Passing multidimensional arrays as function arguments in C

C/C++ library for reading MIDI signals from a USB MIDI device

Choosing a static code analysis tool

How do you printf an unsigned long long int?

Good STL-like library for C.

Rockbox audio format

Why am I getting a malloc: double free error with realloc()?

Should I learn C?

GTK implementation of MessageBox

Is gettimeofday() guaranteed to be of microsecond resolution?

How to use the C socket API in C++ on z/OS