views:

13

answers:

2

For a given word I'd like to find the n closest misspellings. I was wondering if an open source spell checker like aspell would be useful in that context unless you have other suggestions.

For example: 'health'

would give me: ealth, halth, heallth, healf, ...

A: 

Spelling correction tools take misspelled words and offer possible correctly spelled alternatives. You seem to want to go in the other direction.

Going from a correctly spelled word to a set of possible misspellings could probably be performed by applying a set of mutation heuristics to common words. These heuristics might do things like:

  • randomly adding or removing single characters
  • randomly apply transpositions of pairs of characters
  • changing characters to other characters based on keyboard layouts
  • application of common "point" misspellings; e.g. transposing "ie" to "ei", doubling or undoubling "l"s.

Going from a correctly spelled word to a set of common misspellings is really hard. Probably the only reliable way to do this would be to instrument a spelling checker package used by a large community of users, record the actual spelling corrections made using the spelling checker, and aggregate the results. That is probably (!) beyond the scope of your project.

Stephen C