ansaurus

Question

How to best match two strings?

Answer 1

A:

Maybe this one will be useful for ya:

http://stackoverflow.com/questions/53480/fuzzy-text-sentencestitles-matching-in-c

Lukas Šalkauskas 2008-12-09 08:05:41

Answer 2

+6 A:

The Levenstein distance is such a measure. It basically tells you how many characters need to be edited, deleted or added, to get from the first to the second string. I'm not sure whether some database systems support that.

But I know for sure that a much more simplified algorithm named Soundex is supported in some database systems.

Bananeweizen 2008-12-09 08:05:59

Answer 3

+1 A:

I think the problem you're looking for is called Edit Distance. It is expensive to compute in general, but if you are looking for strings within small edit distance of other strings, it is not so bad. There is more information in the Wikipedia article.

Norman Ramsey 2008-12-09 08:06:04

Answer 4

A:

Would this be of help? I just ran into it. http://stackoverflow.com/questions/188476/comparing-two-strings-producing-a-numeric-delta

Skunk 2008-12-09 08:07:41

Answer 5

+2 A:

It depends upon your criteria for similarity. Other people have already referred you to Levenstein distance (edit distance is the same thing). That's usually pretty good, and definitely more language-independent than something like soundex. However, be aware that Levenstein difference does not handle transposition very well. Thus:

Levenstein("copy", "cpoy") == 2

If you're trying to deal with human input, transpositions are fairly common. Whether that's a problem or not depends on your metrics for similarity.

It's been a while, but I believe Postgresql has levenstein() either built-in or available as a contrib C module.

Tom 2008-12-09 14:25:00

Answer 6

A:

How to best match two strings? Have them go out for coffee, and if they hit it off, dinner and a movie. Or maybe they could do some peer programming? It depends on the strings, really. Even coffee can often be tricky.

Yar 2010-06-17 16:17:29

ansaurus

tags:

views:

answers:

How to best match two strings?

related questions