views:

590

answers:

4

How does a T9 dictionary work? What is the data structure behind it. If we type '4663' we get 'good' when we press down button we get 'gone' then 'home' etc...

EDIT: If the user types in 46 then it should show 'go' and when pressed down arrow should show 'gone' etc...

+9  A: 

It can be implemented in several ways, one of them is Trie. The route is represented by the digits and the nodes point to collection of words.

It can be implemented using nested hash tables as well, the key of the hash table is a letter and on every digit the algorithm calculates all possible routes (O(3^n) routes).

Elisha
+1 for mentioning Trie.
Jack
A: 

4663

translates to

{G,H,I}{M,N,O}{M,N,O}{D,E,F}

T9 works by filtering the possibilities down sequentially starting with the first possible letters. So the first step in your example will be to filter the dictionary list to all words beginning with G, H, or I. Next step, take that list and filter the second letters by M, N, O. And so on...

David Johnson
+1  A: 

I think that T9 is using a bit-map-based Trie. On the first level there is a 32-bit (I assume they expanded to 32) word where each bit represents a letter. All letters that are valid as start letters for a word have their bit set.

In the next level there is one 32-bit map for each of those bits that were set in the previous level. In these maps each bit is set if the corresponding letter is valid as a second letter in a word, with the starting letter decided by the first level.

The structure then continues. Using one-bit-per-letter makes a very efficient storage. The addressing scheme however must be challenging. There might also be some kind of optimization using the above schema for short words while using something else for longer words.

Anders Abel
+1  A: 

I'd probably do it by starting with a dictionary, then (for each word) push the word onto a list keyed by the numbers that could form the word.

Then, you'll probably need to sort the resulting lists in some fashion, so more likely words appear before less likely words (if you have the space, I'd also include a small area for a counter, so we can incrementally re-sort these OR simply mov ethe last used to the front of the suggestion list, so we, over time, tend towards giving the user a better answer).

That way, when you have 4663 as an input, you can quite simply retrieve the relevant linked list wit ha roughly O(1) hash table lookup.

Vatine