ansaurus

Question

Answer 1

A:

I think you need to do a multi step process not just Levenshtein. First I would check if the input word is a form of target word. That would catch your 3rd example and also no worry about adding dots. You could also use this step to catch synonyms. The next step is to check the length difference of the two strings.

If the difference is 0 you can do a letter for letter comparison to place the dots. If you don't want to show all dots then you should keep a count of dots placed and once over the limit display some error message. (Sorry that was incorrect)

If the difference is shows the input is longer you need to check for a letter to be delete would fix the problem. Here you can use Levenshtein to see how close they are if they are too far away show your error message if it is in range you will need to do the steps of Levenshtein in reverse and mark the changes somehow. Not sure how you want to show that a letter needs deleted.

If the difference shows the input is shorter you can do use Levenshtein distance to see if the two words are close enough or show the error. Then do the steps in reverse again adding dots for insertions and dots for substitutions.

Actually the last two steps can be combined into one function that runs through the algorithm and remembers the insert delete or substitution and changes the output accordingly.

Jeff Beck 2009-12-31 16:01:35

Answer 2

+3 A:

The terminology you are looking for is called "edit distance." Using something like Levenshtein distance will tell you the number of operations needed to transform one string into the other (insertions, deletion, substitutions, etc).

Here is a list of other "editing distance" algorithms.

Once you decide that a word is "close enough" (i.e. it doesn't exceed some threshold of edits needed), you can show where the edits need to occur (by showing dots).

So how do you know where to put the dots?

The interesting thing about the "Levenshtein distance" is it uses a M x N matrix with one word on each axis (see the sample matrix in Levenshtein article). Once you create the matrix, you can determine which letters require "additional edits" to be correct. That's where you put the dots. If the letter requires "no additional edits," you simply print the letter. Pretty cool.

Robert Cartaino 2009-12-31 16:02:03

Answer 3

+3 A:

First, take a look at the Levenshtein algorithm on wikipedia. Then go on and look at the examples and the resulting matrix d on the article page:

                *k*  *i*  *t*  *t*  *e*  *n*
        >0    1    2    3    4    5    6 
*s*      1      >1    2    3    4    5    6 
*i*      2       2   >1    2    3    4    5 
*t*      3       3    2   >1    2    3    4 
*t*      4       4    3    2   >1    2    3 
*i*      5       5    4    3    2   >2    3 
*n*      6       6    5    4    3    3   >2 
*g*      7       7    6    5    4    4   >3

The distance is found on lower right corner of the matrix, d[m, n]. But from there it is now possible to follow backtrack the minimum steps to the upper left of of the matrix, d[1, 1]. You just go left, up-left, or up at each step whichever minimizes the path. In the example above, you'd find the path marked by the '>' signs:

  s i t t i n g      k i t t e n
0 1 1 1 1 2 2 3    0 1 1 1 1 2 2 3
  ^       ^   ^      ^       ^   ^
  changes in the distance, replace by dots

Now you can find on the minimum path at which locations d[i,j] the distance changes (marked by ^ in the example above), and for those letters you put in the first (or second) word a dot at position i (or j respectively).

Result:

  s i t t i n g      k i t t e n
  ^       ^   ^      ^       ^   ^
  . i t t . n .      . i t t . n .

catchmeifyoutry 2009-12-31 16:06:33

Thank you!This helps me a lot (it also helps me understanding the matrix)

Harmen 2009-12-31 16:38:33

you're welcome, and happy new year :)

catchmeifyoutry 2010-01-01 06:45:33

ansaurus

tags:

views:

answers:

Place dots where a word is misspelled

So how do you know where to put the dots?

related questions