levenshtein

How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions. The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi? Just a note on performance: This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run throug...

Did you mean...? How to Guess What the User Meant to Type (on a 404 Page)

I'm customizing the 404 page for my website. I'd like it to include a "Did you mean...?" I need to figure out how to do this. Here's what I'm doing so far: I come up with a broad list of files that the user might be looking for, then use levenshtein() to compare each possible filename to the mistyped filename. Those with the lowest d...

Levenshtein algorithm: How do I meet this text editing requirements?

Hi, I'm using levenshtein algorithm to meet these requirements: When finding a word of N characters, the words to suggest as correction in my dictionary database are: Every dictionary word of N characters that has 1 character of difference with the found word. Example: found word:bearn, dictionary word: bears Every dictionary word ...

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database. For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS" should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with s...

Is there an edit distance algorithm that takes "chunk transposition" into account?

I put "chunk transposition" in quotes because I don't know whether or what the technical term should be. Just knowing if there is a technical term for the process would be very helpful. The Wikipedia article on edit distance gives some good background on the concept. By taking "chunk transposition" into account, I mean that Turing, Al...

Matching an approximate string in a Core Data store

Hi everyone. I have a small problem with the core data application i'm currently writing. I have two differents models, contexts and peristent stores. One is for my app data, the other one is for a website with relevant infos to me. Most of the time, I match exactly one record from my app to another record from the other source. Someti...

Is Levenshtein's distance the right way to tackle this Edit Steps problem?

I'm familiar with Levenshtein's distance, so I decided I would use it to solve UVA's Edit Steps Ladder problem. My solution is: import java.io.*; import java.util.*; class LevenshteinParaElJuez implements Runnable{ static String ReadLn(int maxLength){ // utility function to read from stdin, ...

Speeding up levenshtein / similar_text in PHP

I am currently using similar_text to compare a string against a list of ~50,000 which works although due to the number of comparisons it's very slow. It takes around 11 minutes to compare ~500 unique strings. Before running this I do check the databases to see whether it has been processed in the past so everytime after the inital run i...

Damerau-Levenshtein distance for words

I am looking for such an algorithm, but one that makes substitutions between words and not between letters. Is there such an algorithm? I am looking for an implementation with SQL Server, but the name of the algorithm will be good enough. ...

Levenshtein distance combination

LD = Levenshtein Distance Just doing a few examples on paper, this seems to work, but does anyone know if this is always true? Lets say I have 3 strings BOT BOB BOM LD(BOT,BOB) = 1 and LD(BOB,BOM)=1 then LD(BOT,BOM)=max(LD(BOT,BOB) , LD(BOB,DOM))=1 OR BAAB BBAB BCCD LD(BBAB,BAAB) = 1 and LD(BBAB,BCCD)=3 then LD...

How to configure SOLR to use Levenshtein approximate string matching?

Does Apaches Solr search engine provide approximate string matches, e.g. via Levenshtein algorithm? I'm looking for a way to find customers by last name. But I cannot guarantee the correctness of the names. How can I configure SOLR so that it would find the person "Levenshtein" even if I search for "Levenstein" ? ...

Recommendation needed: Rails, Postgres and fuzzy full text search

I have Rails app with a Postgres backend. I need to add full text search which would allow fuzzy searches based on Levenshtein distance or other similar metrics. Add the fact that the lexer/stemmer has to work with non-English words (it would be ok to just switch language-dependent features off when lexing, to not mess with the target l...

Place dots where a word is misspelled

Hello, I'm creating a web app in PHP where people can try to translate words they need to learn for school. For example, someone needs to translate the Dutch word 'weer' to 'weather' in English, but unfortunately he types 'whether'. Because he almost typed the right word, I want to give him another try, with dots '.' on the places wher...

Modifying a Levenshtein distance function to calculate distance between two sets of x-y coordinates?

I've been trying to work on modifying a Levenshtein Distance function so that it can find the distance between two lines, or sets of x-y coordinates (in other words, how similar or different the lines are, not their geometric distance). I'm running into some problems though. I get how you take the value above to get deletion cost, and th...

Seeking algo for text diff that detects and can group similar lines

I am in the process of writing a diff text tool to compare two similar source code files. There are many such "diff" tools around, but mine shall be a little improved: If it finds a set of lines are mismatched on both sides (ie. in both files), it shall not only highlight those lines but also highlight the individual changes in these l...

Will these optimizations to my Ruby implementation of diff improve performance in a Rails app?

<tl;dr> In source version control diff patch generation, would it be worth it to use the optimizations listed at the very bottom of this writing (see <optimizations>) in my Ruby implementation of diff for making diff patches? </tl;dr> <introduction> I am programming something I have never done before and there might already be tools out...

how to convert python/cython unicode string to array of long integers, to do levenshtein edit distance

I have the following Cython code (adapted from the bpbio project) that does Damerau-Levenenshtein edit-distance calculation: #--------------------------------------------------------------------------- cdef extern from "stdlib.h": ctypedef unsigned int size_t size_t strlen(char *s) void *malloc(size_t size) void *calloc(size_t n...

Levenshtein Generalization for Graphs?

Is there a generalization of the levenshtein distance for searching for structures in graphs? ...

how to configure solr / lucene to perform levenshtein edit distance searching?

i have a long list of words that i put into a very simple SOLR / Lucene database. my goal is to find 'similar' words from the list for single-term queries, where 'similarity' is specifically understood as (damerau) levensthein edit distance. i understand SOLR provides such a distance for spelling suggestions. in my SOLR schema.xml, i ha...

Haskell tail-recursion performance question for Levenshtein distances

I'm playing around with calculating Levenshtein distances in Haskell, and am a little frustrated with the following performance problem. If you implement it most 'normal' way for Haskell, like below (dist), everything works just fine: dist :: (Ord a) => [a] -> [a] -> Int dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2) ldist :: (Or...