views:

3259

answers:

6

I would like to be able to search a table as follows for smith as get everything that it within 1 variance.

Data:

O'Brien
Smithe
Dolan
Smuth
Wong
Smoth
Gunther
Smiht

I have looked into using Levenshtein distance does anyone know how to implement this with it?

+7  A: 

Does this help? Levenshtein Distance as a MySQL stored function (Google Cache)

schnaader
+1 - I have implemented this one, I looked at it before posting. Its works but the putting it in a search(with some performance) is what I am trying to figure out.
Andrew Clark
+3  A: 

In order to efficiently search using levenshtein distance, you need an efficient, specialised index, such as a bk-tree. Unfortunately, no database system I know of, including MySQL, implements bk-tree indexes. This is further complicated if you're looking for full-text search, instead of just a single term per row. Off-hand, I can't think of any way that you could do full-text indexing in a manner that allows for searching based on levenshtein distance.

Nick Johnson
+1  A: 

An implementation for the damerau-levenshtein distance can be found here: Damerau-Levenshtein algorithm: Levenshtein with transpositions The improvement over pure Levenshtein distance is that the swapping of characters is considered. I found it in the comments of schnaader's link, thanks!

Ponk
unfortunately this result in it being 10% slower. I have however implemented the string length, he proposes using string at max or smaller, I have implemented a compare on only string +/- 1 length.
Andrew Clark
A: 

I am setting up a search based on Levenshtein or Damerau-Levenshtein (probably the latter) for multiple searches over an indexed text, based on a paper by by Gonzalo Navarro and Ricardo Baeza-yates: link text

After building a suffix array (see wikipedia), if you are interested in a string with at most k mismatches to the search string, break the search string into k+1 pieces; at least one of those must be intact. Find the substrings by binary search over the suffix array, then apply the distance function to the patch around each matched piece.

The links here were a big help. Thanks to all.

Hugh
A: 

Please give me a call. My company, Exorbyte, was developed this technology. My number is 503-488-5669

Dan Lesage