tags:

views:

74

answers:

1

I am implementing a variation of spell checker. After taking various routes (for improving the time efficiency) I am planning to try out a component which would involve use of n-gram model. So essentially I want to prune the list of likely candidates for further processing. Would you guys happen to know if using one value of n (say 2) will be better over other (say 3)?

A: 

According to this website, the average word length in English is 5.10 letters. I would assume that people are more likely to misspell longer words than shorter words, so I'd lean towards going around maybe 3-5 letters forward, if possible, as a gut feeling.

Mark Rushakoff