ansaurus

Question

Levenshtein algorithm: How do I meet this text editing requirements?

Answer 1

+3 A:

You may also want to add Norvig's excellent article on spelling correction to your reading.

It's been a while since I've read it but I remember it being very similar to what your writing about.

Triptych 2009-01-27 19:30:21

Thanks, but I don't know python..

Melkhiah66 2009-01-27 19:34:29

the article might be in, but is not about python

Bedwyr Humphreys 2009-01-27 20:14:54

Even without knowing Python code, you'll probably be able to follow the article. The only non-obvious stuff I see in that code is 'list comprehensions', which you can google for.

Triptych 2009-01-27 21:35:26

Answer 2

A:

You'll need to find a dictionary (a text file containing a list of words), and see if any of the strings you've created exist in the dictionary.

If they do, then add that to your list of words to suggest.

David 2009-01-27 19:46:00

all the words he is comparing with the misspelling are in the dictionary

nlucaroni 2009-01-27 20:01:58

Nowhere in the sample code do I see a dictionary being loaded in and stored into an array.

David 2009-02-02 19:10:01

Answer 3

A:

Why restrict the suggestion to a single word, why not include a set of words? If you are restricted to a single word, you can order your results by some pre-calculated frequency of usage or something. This frequency could be updated based on what users select from the suggestion.

Also, in the case where there isn't a spelling error in the original word, you might want to prioritize the N+1 cases, which would be more like an autocomplete. Anyway I don't think there is one correct way to do it, maybe if your requirements are more specific, it would be easier to narrow down.

Also, you don't need to know Python to understand the algorithms described in Norvig's article.

codelogic 2009-01-27 19:58:03

Answer 4

+2 A:

As I've said elsewhere, Boyer-Moore isn't really apt for this. Since you want to search for multiple stings simultanously, the algorithm of Wu and Manber should be more to your liking.

I've posted a proof of concept C++ code in answer to another question. Heed the caveats mentioned there.

Konrad Rudolph 2009-01-27 20:00:03

Answer 5

A:

If I understand you correctly, then there is no correct answer to your question. You are identifying up to three suggestions for a given word using Levenshtein - it is up to you to come up with a rule to decide which one to use and which ones to filter out. Or perhaps you should use them all?

Just as a matter of interest, the Damerau extension to Levenshtein might be of interest to you, where two swapped characters are also considered to give a score of 1, instead of 2, which is what vanilla Levenshtein returns.

2009-01-27 20:11:45

ansaurus

tags:

views:

answers:

Levenshtein algorithm: How do I meet this text editing requirements?

related questions