views:

2088

answers:

3

So if I have to choose between a hash table or a prefix tree what are the discriminating factors that would lead me to choose one over the other. From my own naive point of view it seems as though using a trie has some extra overhead since it isn't stored as an array but that in terms of run time (assuming the longest key is the longest english word) it can be essentially O(1) (in relation to the upper bound). Maybe the longest english word is 50 characters?

Hash tables are instant look up once you get the index. Hashing the key to get the index however seems like it could easily take near 50 steps.

Can someone provide me a more experienced perspective on this? Thanks!

+7  A: 

It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.

Adam Rosenfield
+1  A: 

Some (usually embedded, real-time) applications require that the processing time be independent of the data. In that case, a hash table can guarantee a known execution time, while a trie varies based on the data.

Adam Liss
Most hash tables don't guarantee a known execution time - the worst case is O(n), if every element collides and gets chained
Adam Rosenfield
For any data set, you can compute a perfect hash function that will guarantee O(1) lookups for that data. Of course, computing the perfect hash ain't free.
George V. Reilly
Also, chaining is not the only way to handle collisions; there are all sorts of interesting, clever ways to handle this—cuckoo hashing (http://en.wikipedia.org/wiki/Cuckoo_hashing) for one—and the best choice depends on the needs of the client code.
Hank Gay
+15  A: 

Advantages of tries:

The basics:

  • Predictable O(k) lookup time where k is the size of the key
  • Lookup can take less than k time if it's not there
  • Supports ordered traversal
  • No need for a hash function
  • Deletion is straightforward

New operations:

  • You can quickly look up prefixes of keys, enumerate all entries with a given prefix, etc.

Advantages of linked structure:

  • If there are many common prefixes, the space they require is shared.
  • Immutable tries can share structure. Instead of updating a trie in place, you can build a new one that's different only along one branch, elsewhere pointing into the old trie. This can be useful for concurrency, multiple simultaneous versions of a table, etc.
  • An immutable trie is compressible. That is, it can share structure on the suffixes as well, by hash-consing.

Advantages of hashtables:

  • Everyone knows hashtables, right? Your system will already have a nice well-optimized implementation, faster than tries for most purposes.
  • Your keys need not have any special structure.
  • More space-efficient than the obvious linked trie structure
Darius Bacon