views:

291

answers:

5

LD = Levenshtein Distance

Just doing a few examples on paper, this seems to work, but does anyone know if this is always true?

Lets say I have 3 strings

BOT

BOB

BOM

LD(BOT,BOB) = 1

and

LD(BOB,BOM)=1

then

LD(BOT,BOM)=max(LD(BOT,BOB) , LD(BOB,DOM))=1

OR

BAAB

BBAB

BCCD

LD(BBAB,BAAB) = 1

and

LD(BBAB,BCCD)=3

then

LD(BAAB,BCCD)=max(LD(BBAB,BAAB), LD(BBAB,BCCD))=3

I'd like to know if this always applies.

That is,

LD(B,C) = max (LD(A,C),LD(A,B))


EDIT -- Added at 10/22/2009 7:08PM PST

I'm starting to think this holds for words of the same length, otherwise you can still do it but you have to add the absolute value of the difference in word length.

In essence LD(B,C) = max(LD(A,C),LD(A,B)) + abs(length(B)-length(C))

+1  A: 

This is a regular Dynamic Programming problem. The Wikipedia entry has a Proof of Correctness part. Are you looking for something else?

dirkgently
A: 

Nothing is better than a test. If you know C# run it through this.

public Int32 CalculateDistance(String x, String y)
{
    Int32 xl = x.Length;
    Int32 yl = y.Length;
    Int32[,] matrix = new Int32[xl + 1, yl + 1];

    for (Int32 i = 0; i <= xl; i++)
    {
     matrix[i, 0] = i;
    }

    for (Int32 i = 0; i <= yl; i++)
    {
     matrix[0, i] = i;
    }

    for (Int32 j = 1; j <= yl; j++)
    {
     for (Int32 i = 1; i <= xl; i++)
     {      
      if (x[i - 1] == y[j - 1])
      {
       matrix[i, j] = matrix[i - 1, j - 1];
      }
      else     
      {
       matrix[i, j] = Min((matrix[i - 1, j] + 1), (matrix[i, j - 1] + 1), (matrix[i - 1, j - 1] + 1));
      }
     }
    } 

    return matrix[xl, yl];
}
ChaosPandion
I'll do that -- I know C#. I'll just setup a thousand random strings and see if it holds.
Matt
A: 

Didn't hold true for this case

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LevenshteinDistance
{
class Program
{
    static void Main(string[] args)
    {
        LevenshteinDistance ld = new LevenshteinDistance();
        string a="B";
        string b="Book";
        string c = "Sick";

        Console.WriteLine("{0} = Max( {1}, {2} )", ld.Compute(b, c), ld.Compute(a, c), ld.Compute(a, b)); 
        if (ld.Compute(b, c) == Math.Max(ld.Compute(a, c), ld.Compute(a, b)))
            Console.WriteLine("Equal");
        else
            Console.WriteLine("Not Equal");
        Console.ReadKey();

    }

}

class LevenshteinDistance
{
    //****************************
    // Get minimum of three values
    //****************************

    private int Minimum(int a, int b, int c)
    {
        int min;

        min = a;
        if (b < min)
        {
            min = b;
        }
        if (c < min)
        {
            min = c;
        }
        return min;

    }

    //*****************************
    // Compute Levenshtein distance
    //*****************************

    public int Compute(string s, string t)
    {
        int[,] matrix; // matrix
        int n; // length of s
        int m; // length of t
        int i; // iterates through s
        int j; // iterates through t
        char s_i; // ith character of s
        char t_j; // jth character of t
        int cost; // cost

        // Step 1
        n = s.Length;
        m = t.Length;
        if (n == 0)
        {
            return m;
        }
        if (m == 0)
        {
            return n;
        }
        matrix = new int[n + 1, m + 1];

        // Step 2

        for (i = 0; i <= n; i++)
        {
            matrix[i, 0] = i;
        }

        for (j = 0; j <= m; j++)
        {
            matrix[0, j] = j;
        }

        // Step 3

        for (i = 1; i <= n; i++)
        {

            s_i = s[(i - 1)];

            // Step 4

            for (j = 1; j <= m; j++)
            {

                t_j = t[(j - 1)];

                // Step 5

                if (s_i == t_j)
                {
                    cost = 0;
                }
                else
                {
                    cost = 1;
                }

                // Step 6

                matrix[i, j] = Minimum(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1, matrix[i - 1, j - 1] + cost);

            }

        }

        // Step 7

        return matrix[n, m];

    }

}

}

madan
Got the Algorithm from http://www.merriampark.com/ld.htm and converted it for C#
madan
+2  A: 

Doesn't work.

LD("BOB", "BOT") == 1
LD("BOT", "BOB") == 1

LD("BOB", "BOB") == 0
max(LD("BOB", "BOT"), LD("BOT", "BOB")) == 1

0 != 1

there are probably harder examples also...

egon
A: 

No, but this does:

lev(a, c) <= lev(a, b) + lev(b, c) (a.k.a "triangle inequality)

...and is used heavily as a heuristic by VP-Trees and BK-Trees.

Being a metric the levenshtein distance follows the triangle inequality:
http://en.wikipedia.org/wiki/Triangle_inequality

Regexident