views:

330

answers:

2

I am developing a Trie data-structure where each node represents a word. So words st, stack, stackoverflow and overflow will be arranged as

root
--st
---stack
-----stackoverflow
--overflow

My Trie uses a HashTable internally so all node lookup will take constant time. Following is the algorithm I came up to insert an item into the trie.

  1. Check item existence in the trie. If exist, return, else goto step2.
  2. Iterate each character in the key and check for the existence of the word. Do this until we get a node where the new value can be added as child. If no node found, it will be added under root node.
  3. After insertion, rearrange the siblings of the node under which the new node was inserted. This will walk through all the siblings and compare against the newly inserted node. If any of the node starts with same characters that new node have, it will be moved from there and added as child of new node.

I am not sure that this is the correct way of implementing a trie. Any suggestions or improvements are welcome.

Language used : C++

+6  A: 

The trie should look like this

                      ROOT
             overflow/    \st
                    O      O
                            \ack
                             O
                              \overflow
                               O

Normally you don't need to use hash tables as part of a trie; the trie itself is already an efficient index data structure. Of course you can do that.

But anyway, your step (2) should actually descend the trie during the search and not just query the hash function. In this way you find the insertion point readily and don't need to search for it later as a separate step.

I believe step (3) is wrong, you don't need to rearrange a trie and as a matter of fact you shouldn't be able to because it's only the additional string fragments that you store in the trie; see the picture above.

antti.huima
Thanks. I know standard `trie` looks like how you explained. But I need each node to represent a whole word and not just suffixes. Descending trie to find the location to insert seems to be decent idea. Thanks again.
Appu
It sounds like what you want is a HAT-trie. Look it up.
Justin Peel
@Appu: if your nodes have a parent link you can always get the full word pack even when the paths only contain "suffixes"
Ritsaert Hornstra
A: 

Did you had a chance to look at this one .. http://linux.thai.net/~thep/datrie/datrie.html#Insert

Vadi