I'm working on a Google App Engine program that will require some basic spell checking features. Normally iSpell or it's cousins would be options, but I'm not sure that will work in GEA. Are there other strategies/tools that would work in that environment?
views:
80answers:
2A very minimal, pure-Python spell checker can be found here: http://norvig.com/spell-correct.html
The big.txt
file Norvig uses to train his spell checker is too large to upload to App Engine at 6.2 megabytes, but the NWORDS
dict that results from training is only ~650K when pickled. So one solution might be to pre-train the spell checker, pickle the results and include the pickled training data in your application.
This spell checker might not be good enough for your needs, and the way I've proposed you integrate it into your app might be an absolutely terrible idea. I'm really not sure. Might be interesting to try, though.
I personally would try to go down the route of using Google's API for spellcheck. I'm trying to find it now, but I believe their exposed web service includes a spell checker.
It's always tough finding good python libraries that are actually being maintained. On the other hand, I imagine Google's service should be around and dependable for a while.
Not sure in what format the results come back, but on your side, you could implement your own Levenstein distance formula to see how close the results are to your word in question.
Mark