levenshtein-distance

Damerau - Levenshtein Distance, adding a threshold

I have the following implementation, but I want to add a threshold, so if the result is going to be greater than it, just stop calculating and return. How would I go about that? EDIT: Here is my current code, threshold is not yet used...the goal is that it is used public static int DamerauLevenshteinDistance(string string1, string...

Modifying Levenshtein Distance algorithm to not calculate all distances.

I'm working on a fuzzy search implementation and as part of the implementation, we're using Apache's StringUtils.getLevenshteinDistance. At the moment, we're going for a specific maxmimum average response time for our fuzzy search. After various enhancements and with some profiling, the place where the most time is spent is calculating t...

Calculating a relative Levenshtein distance - make sense?

I am using both Daitch-Mokotoff soundexing and Damerau-Levenshtein to find out if a user entry and a value in the application are "the same". Is Levenshtein distance supposed to be used as an absolute value? If I have a 20 letter word, a distance of 4 is not so bad. If the word has 4 letters... What I am now doing is taking the distanc...

Is it possible to calucate the edit distance between a regexp and a string?

If so, please explain how. Re: what is distance -- "The distance between two strings is defined as the minimal number of edits required to convert one into the other." For example, xyz to XYZ would take 3 edits, so the string xYZ is closer to XYZ and xyz. If the pattern is [0-9]{3} or for instance 123, then a23 would be closer to the ...

Levenshtein DFA in .NET

Good afternoon, Does anyone know of an "out-of-the-box" implementation of Levenshtein DFA (deterministic finite automata) in .NET (or easily translatable to it)? I have a very big dictionary with more than 160000 different words, and I want to, given an inicial word w, find all known words at Levenshtein distance at most 2 of w in an ef...

Advice on how to improve a current fuzzy search implementation.

Hello. I'm currently working on implementing a fuzzy search for a terminology web service and I'm looking for suggestions on how I might improve the current implementation. It's too much code to share, but I think an explanation might suffice to prompt thoughtful suggestions. I realize it's a lot to read but I'd appreciate any help. Fi...

How to modify Levenshteins Edit Distance to count "adjacent letter exchanges" as 1 edit

I'm playing around with Levenshteins Edit Distance algorithm, and I want to extend this to count transpositions -- that is, exchanges of adjacent letters -- as 1 edit. The unmodified algorithm counts insertions, deletes or substitutions needed to reach a certain string from another. For instance, the edit distance from "KITTEN" to "SITTI...

Levenshtein Distance Algorithm better than O(n*m)?

I have been looking for an advanced levenshtein distance algorithm, and the best I have found so far is O(n*m) where n and m are the lengths of the two strings. The reason why the algorithm is at this scale is because of space, not time, with the creation of a matrix of the two strings such as this one: Is there a publicly-available l...

How do I convert between a measure of similarity and a measure of difference (distance)?

Is there a general way to convert between a measure of similarity and a measure of distance? Consider a similarity measure like the number of 2-grams that two strings have in common. 2-grams('beta', 'delta') = 1 2-grams('apple', 'dappled') = 4 What if I need to feed this to an optimization algorithm that expects a measure of differen...