A word frequency list is what you want. You can also make your own, or customize one for use within a particular domain, and it is a nice way to become familiar with some good libraries. Start with some text such as discussed in this question, then try out some variants of this back-of-the-envelope script:
from nltk.stem.porter import PorterStemmer
import os
import string
from collections import defaultdict
ps = PorterStemmer()
word_count = defaultdict(int)
source_directory = '/some/dir/full/of/text'
for root, dirs, files in os.walk(source_directory):
for item in files:
current_text = os.path.join(root, item)
words = open(current_text, 'r').read().split()
for word in words:
entry = ps.stem_word(word.strip(string.punctuation).lower())
word_count[entry] += 1
results = [[word_count[i], i] for i in word_count]
print sorted(results)
This gives the following on a couple of books downloaded, re the most common words:
[2955, 'that'], [4201, 'in'], [4658, 'to'], [4689, 'a'], [6441, 'and'], [6705, 'of'], [14508, 'the']]
See what happens when you filter out the most common x y or z number from your queries, or leave them out of your text search index entirely. Also might get some interesting results if you include real world data -- for example "community" "wiki" is not likely a common word on a generic list, but on SO that obviously wouldn't be the case and you might want to exclude them.