I am effectively trying to solve the same problem as this question:
minus the requirement that words represent physical objects. The answers and edited question seem to indicate that a good start is building a list of frequency of n-grams using wikipedia text as a corpus. Before I start downloading the mammoth wikipedia dump, does anyone know if such a list already exists?
PS if the original poster of the previous question sees this, I would love to know how you went about solving the problem, as your results seem excellent :-)