I'm building a spelling corrector for search engine queries by implementing the method described in "Spelling correction as an iterative process that exploits the collective knowledge of web users".
The high-level approach is as follows: for a given query, come up with possible correction candidates (words in the query log within a certain edit distance) of each unigram and bigram, then perform a modified Viterbi search to find the most likely sequence of candidates given bigram frequencies. Repeat this process until the sequence is of maximum probability.
The modification to the Viterbi search is such that if two adjacent words are both found in a trusted lexicon, at most one can be corrected. This is especially important for avoiding correction of properly-spelled single-word queries to words of higher frequency.
My question is where to find such a lexicon. It should be in English and contain proper nouns (first/last names, places, brand names, etc) likely to show up in search queries as well as common and uncommon English words. Even a push in the right direction would be useful.
Also, if anyone is reading this and has any suggestions for improvement on the methodology supplied in the paper, I am open to those as well given that this is my first foray into NLP.