I am interested in algorithm in T-SQL calculating Levenshtein distance.
It seems I have googled a newer version of the Arnold Fribble's implementation.
Alexander Prokofyev
2009-02-18 11:46:57
+3
A:
Arnold Fribble proposes this one:
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
CREATE FUNCTION edit_distance_within(@s nvarchar(4000), @t nvarchar(4000), @d int)
RETURNS int
AS
BEGIN
DECLARE @sl int, @tl int, @i int, @j int, @sc nchar, @c int, @c1 int,
@cv0 nvarchar(4000), @cv1 nvarchar(4000), @cmin int
SELECT @sl = LEN(@s), @tl = LEN(@t), @cv1 = '', @j = 1, @i = 1, @c = 0
WHILE @j <= @tl
SELECT @cv1 = @cv1 + NCHAR(@j), @j = @j + 1
WHILE @i <= @sl
BEGIN
SELECT @sc = SUBSTRING(@s, @i, 1), @c1 = @i, @c = @i, @cv0 = '', @j = 1, @cmin = 4000
WHILE @j <= @tl
BEGIN
SET @c = @c + 1
SET @c1 = @c1 - CASE WHEN @sc = SUBSTRING(@t, @j, 1) THEN 1 ELSE 0 END
IF @c > @c1 SET @c = @c1
SET @c1 = UNICODE(SUBSTRING(@cv1, @j, 1)) + 1
IF @c > @c1 SET @c = @c1
IF @c < @cmin SET @cmin = @c
SELECT @cv0 = @cv0 + NCHAR(@c), @j = @j + 1
END
IF @cmin > @d BREAK
SELECT @cv1 = @cv0, @i = @i + 1
END
RETURN CASE WHEN @cmin <= @d AND @c <= @d THEN @c ELSE -1 END
END
GO
Alexander Prokofyev
2009-02-18 11:45:04
@Alexander, it seems to work but I would change your variable names to something more meaningfull. Also, I'd get rid of @d, you know the length of the two strings in your input.
Lieven
2009-02-18 12:01:51
@Lieven: It isn't my implementation, the author is Arnold Fribble. @d parameter is a maximal allowed difference between strings after reaching which they are considered too diverse and function returns -1. It's added because the algorithm in T-SQL works too slowly.
Alexander Prokofyev
2009-02-18 12:26:50
+3
A:
IIRC, with SQL Server 2005 and later you can write stored procedures in any .NET language: Using CLR Integration in SQL Server 2005. With that it shouldn't be hard to write a procedure for calculating Levenstein distance.
A simple Hello, World! extracted from the help:
using System;
using System.Data;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
public class HelloWorldProc
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void HelloWorld(out string text)
{
SqlContext.Pipe.Send("Hello world!" + Environment.NewLine);
text = "Hello world!";
}
}
Then in your SQL Server run the following:
CREATE ASSEMBLY helloworld from 'c:\helloworld.dll' WITH PERMISSION_SET = SAFE
CREATE PROCEDURE hello
@i nchar(25) OUTPUT
AS
EXTERNAL NAME helloworld.HelloWorldProc.HelloWorld
And now you can test run it:
DECLARE @J nchar(25)
EXEC hello @J out
PRINT @J
Hope this helps.
Leandro López
2009-02-18 11:48:27