ansaurus

Question

What are the things should we consider while writing a Spell Checker?

Answer 1

+5 A:

Before investing too much trying to unravel that i'd first look to already existing implementations like Aspell or netspell for two main reasons

Not much point in re-inventing the wheel. Spell checking is much trickier than it first appears and it makes sense to build on work that has already been done
If your interest is finding out how to do it, the source code and community will be a great benefit should you decide to implement your own anyway

Conrad 2009-04-30 13:25:53

That said, it's quite an interesting problem to work on. Reinventing the wheel can be educational too.

Richard Nichols 2009-04-30 14:00:20

Answer 2

+1 A:

Edit Distance is the theory you need to write a spell checker. You also need a dictionary. Most UNIX systems come with a dictionary already installed for your locale.

jldugger 2009-04-30 13:27:02

Answer 3

A:

Under linux/unix you have ispell. Why reinventing the whell.

Luixv 2009-04-30 13:43:31

It's ironic that in an answer about spell checking you've spelt "wheel" incorrectly...!

Sean 2009-04-30 13:45:34

They also missed a hyphen, but I doubt that was meant quite as ironically. :)

Joe Swan 2009-04-30 13:46:45

Sorry. It was not ironically (unfortunately). In fact it was a typo. You a get 1 point from me :-)

Luixv 2009-04-30 14:26:51

Answer 4

A:

Much depends on your use case. For example:

Is your dictionary very small (about twenty words)? In this case it probably is better to precompute all possible nearby mistaken words and use a table/hash lookup.
What is your error model? Aspell has at least two (one for spelling errors caused by nearby letters on the keyboard, and the other for spelling errors caused by the way a word sounds).
How dynamic is your dictionary? Can you afford to do a massive preparation in order to get an efficient retrieval?
You may need a "word equivalence" measure like Double Metaphone, in addition to edit distance.
You can get some feel by reading Peter Norvig's great description of spelling correction.
And, of course, whenever possible, steal code. Do not reinvent the wheel without a reason - a reason could be a very special domain, a special way your users make spelling mistakes, or just to learn how it's done.

Yuval F 2009-04-30 13:47:48

Thanks. Your suggestions have been very helpful.

sul4bh 2009-05-01 10:16:14

Glad they did. Good luck.

Yuval F 2009-05-01 14:15:54

Answer 5

A:

I just finished implementing a spell checker and used a combination of the following in getting a list of "suggested" words

Phonetic hashing of the "misspelled" word to lookup a hash of identical dictionary hashed real words (for java check out Apache Commons Codec for a suitable library). The phonetic hash of your dictionary file can be precomputed.
Edit distance between the input and the potentials (this is reasonably expensive so you need to reduce the list first with something like a phonetic hash, assuming a higher volume load - in my case, a server based spell check)
A known list of common misspellings, e.g. recieve vs. receive.
An ordered list of the most common words in the english language

Essentially I weighted each potential word primarily based on edit-distance and commonality. e.g. if word probability is a percentage, then

weight = edit-distance *  100 / probability

(lower weights are better)

But then I also also override any result with the known common misspellings (i.e. these always float to the top suggested result).

There may be better ways, but this worked pretty well.

You may also wish to ignore ALL CAPS words, initials etc, so choosing what to ignore is also something to think about.

Richard Nichols 2009-04-30 13:50:08

ansaurus

tags:

views:

answers:

What are the things should we consider while writing a Spell Checker?

related questions