views:

333

answers:

1

I noticed some posts here on string matching, which reminded me of an old problem I'd like to solve. Does anyone have a good Levenstein-like algorithm that is weighted toward Qwerty keyboards?

I want to compare two strings, and and allow for typos. Levenstein is okay, but I'd prefer to also accept spelling errors based on the physical distance between keys on Qwerty keyboard. In other words, the algorithm should prefer "yelephone" to "zelephone" since the "y" key is located nearer to the "t" key than to the "z" key on most keyboards.

Any help would be great... this feature isn't central to my project, so I don't want to veer off into a rat-hole when I should be doing something more productive.

+9  A: 

In bioinformatics when you align two sequences of DNA you might have a model that has a different cost based on if the substitution is a transition or a transversion. This is exactly what you want but instead of a 4x4 matrix, you want a 40x40 matrix or some, dare I say distance function? So the cost of a replacement is from the matrix/function, not a constant.

CAVEAT: Be sure that deletions and insertions are weighted properly though, so they aren't over accepted as the minimum. You'll end up with a string of insertions/deletions/no-change-substitution characters.

The new function you are trying to minimize would be:

d[i, j] := minimum(
    d[i-1, j] + del_cost,
    d[i, j-1] + ins_cost,
    d[i-1, j-1] + keyboard_distance( s[i], t[j] )
)
nlucaroni