views:

80

answers:

1

For processing language, as in regular dictionary words, which would be faster at reading, a radix tree, or a regular b-tree? Is there a faster method, such as a dictionary with buckets & hashing?

+1  A: 

As always, you will need to benchmark in your application context to be sure.

However, I expect that in this case a well-implemented hash table will probably prove to be fastest. This basically requires:

  • One scan through the string to calculate the hash value, typically using very fast operations such as bit shifting / XORs
  • One hash table lookup based on the hash value
  • One string comparison to confirm you have the right word
  • A little extra processing in the case that there is a hash collision - however you can tune your hashtable size to minimise this

A radix tree will also be very fast, there is just a little bit of extra overhead due to the need to traverse multiple levels of tree nodes. If your tree is relatively sparse, it's likely that may lookups will only need to go down a small number of levels to find a unique answer. One advantage of the radix tree is that it will tell you very early if you have no possible matches (e.g. an empty branch for the tree starting with "qq")

A binary tree will probably be the slowest since it will on average have to search through quite a few levels of tree nodes. However it will still be fast enough for most purposes.

mikera
I think for the purpose of what I want, and based on what you've said, a radix tree will win. It has the further advantage of automatically knowing what the root of a word is.
IanC