(I am not positive on this, please correct me if I am wrong)
The system that Google Scribe uses (or at least a very similar one) would essentially use a tree-like data structure, for storing all possible words. Some form of search algorithm which sees all possible ways you could finish your word, based on known vocabulary. (Probably base doff of older search queries stored in their database) and orders them by how commonly they occur.
For instance:
I type: 'a'
Vocab: 'at' 'apple' 'atrocious'
So: 'at' is used the most, 'apple' second most, and 'atrocious' the least.
Like I said, I'm not sure if this is the system they use, but it should have similar results.
For retrieving occurrence likelihood, you could scan the documents you're searching, or just store on a query-by-query basis to check for your past searches.