How does Trie and B+ tree compare for indexing lexicographically sorted strings [on the order some billions]? It should support range queries as well.
From perf. as well as implementation complexity point of view.
How does Trie and B+ tree compare for indexing lexicographically sorted strings [on the order some billions]? It should support range queries as well.
From perf. as well as implementation complexity point of view.
Depends on your actual task.
If you want to get all subtree then B+Tree is your choice because of it is space efficient. But if you want to get first N
children from substree, that Trie is the best choice because of you simply visit less nodes than in B+ Tree scenario. The most popular task which is handled good by Trie is a word prefix completion.
I would say it depends on what you mean by Range.
If your range is expressed as All words beginning by, then a Trie
is the right choice I'd say. On the other hand, Trie
are not meant for requests like All words between XX and ZZ.
Note that the branching factor of the B+ Tree
affects its performance (the number of intermediary nodes). If h
is the height of the tree, then nmax ~~ bh. Therefore h ~~ log(nmax) / log(b).
With n = 1 000 000 000
and b = 100
, we have h ~~ 5
. Therefore it means only 5 pointer dereferencing for going from the root to the leaf. It's more cache-friendly than a Trie
.
Finally, B+ Tree
is admittedly more difficult to implement than a Trie
: it's more on a Red-Black Tree
level of complexity.