Google has suggestion come up when you make a typo entry,how do they do it?
maybe these links can help
http://answers.google.com/answers/threadview?id=526503
http://googlesystem.blogspot.com/2007/04/simplified-version-of-googles-spell.html
just having frequency count of words is enough, I dont think you need somehthing complicated for this, not even machine learning. No need of learn a model. If you have entered something strange but not a typo you will notice that they try to "correct" it also.
Peter Norvig (Director of Research at Google) has written a little introductory piece on spell checking in Python using statistical heuristics.
It's a great read, and shows how to use statistical heuristics in a very simple way. It could be ported to C# (with LINQ) quite easily (Python's list comprehensions are very close to Linq expressions).
The core part of this snippet is having all simple typos for a word (edit1 function) the C# equivalent is the following
public static IEnumerable<string> Edit1(string word){
const string alphabet = "abcdefghijklmnopqrstuvwxyz";
var s = from i in Enumerable.Range (0, word.Length - 1)
select new Pair<string>(word.Substring (0, i), word.RSlice(i));
var deletes = from p in s
select p.First + p.Second.RSlice (1);
var transposes = from p in s
let b1 = p.Second
where b1.Length > 2
select p.First + b1 [1] + b1 [0] + b1.RSlice (2);
var replaces = from p in s
let b = p.Second
where b.Length > 0
from c in alphabet select p.First + c + b.RSlice (1);
var inserts = from p in s
from c in alphabet
select p.First + c + p.Second;
return deletes.Concat (transposes).Concat( replaces)
.Concat(inserts).Distinct ();}
where Pair is a poor man tuple (obvious code not included) and RSlice is a poor-man string-only right splicing :
public static class Extensions {
public static string RSlice (this string input, int i)
{
if (i > input.Length - 1)
return "";
return input.Substring (i);
}}
Once you got the edits for a word, you look for the word in the dictionary or the existing words of the edits (selecting the most frequent word) or the words for edits1(edits1(word)) (selecting the most frequent word). Surprisingly, this can be quite fast and quite accurate. I'll have a link to my blog for the whole stuff ported.
Edit: oops just saw that a link in an answer above pointed to a pointer to the same Norvig's piece...