views:

3015

answers:

4

I'm currently implementing a radix tree/patricia trie (whatever you want to call it). I want to use it for prefix searches in a dictionary on a severely underpowered piece of hardware. It's supposed to work more or less like auto-completion, i. e. showing a list of words that the typed prefix matches.

My implementation is based on this article, but the code therein doesn't include prefix searches, though the author says:

[...] Say you want to enumerate all the nodes that have keys with a common prefix "AB". You can perform a depth first search starting at that root, stopping whenever you encounter back edges.

But I don't see how that is supposed to work. For example, if I build a radix tree from these words:

illness
imaginary
imagination
imagine
imitation
immediate
immediately
immense
in

I will get the exact same "best match" for the prefixes "i" and "in" so that it seems difficult to me to gather all matching words just by traversing the tree from that best match.

Additionally, there is a radix tree implementation in Java that has an implemented prefix search in RadixTreeImpl.java. That code explicitly checks all nodes (starting from a certain node) for a prefix match - it actually compares bytes.

Can anyone point me to a detailed description on implementing a prefix search on radix trees? Is the algorithm used in the Java implementation the only way to do it?

+1  A: 
Charlie Martin
It's quite possible that my understanding of patricia tries is flawed, but I believe that such tries store - in contrast to normal tries - complete words and always have two child nodes.In my example, I have a root node which has a self reference and "illness" as a child. "illness" in turn has "in" as a child and also a self reference. "imaginary" is a child of "in" and has a reference to "illness" as well as to "imitation". And so on...
j66k
You're right that what I wrote talks about a trie in general; PATRICIA tries use less space, but are more complicated. The basic idea is the same though. Here's another nice page: http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/
Charlie Martin
Basically, a PATRICIA complresses out many of the nodes by storing strings for long chains of single nodes.
Charlie Martin
Which means that I have to compare the string's characters with the prefix I search for, right?
j66k
well, you're pretty much going to ahve to no matter what. i'm not sure I understand the question.
Charlie Martin
I was hoping I could skip comparing individual characters after finding the "best match" as no such thing is explicitly mentioned in the article at CodeProject I cited and linked to.Now that I read it again I begin to think by "perform a depth first search" the author means one has to compare the string to the prefix.
j66k
Yeah, you'll certainly have to complete the search at least as far as the shortest unique prefix.
Charlie Martin
A: 

An alternative algorithm: Keep It Simple Stupid!

Just make a sorted list of your keywords. When you have a prefix, binary search to find where that prefix would be located in the list. All of your possible completions will be found starting at that index, ready to be accessed in place.

This algorithm will will require only 5% of the code of a Patricia trie and will be easy to maintain, understand, and update. It is almost certain this simple list search will be more efficient as well.

The only downside is if you have huge numbers of long keywords with similar prefixes, a trie can save some storage since it doesn't need to keep the full prefix for every entry. In practice, if you have less than a few million words, this is not a savings because the pointer overhead of the tree will dominate. This savings is more for applications like searching databases of DNA strings with millions of characters, not text keywords.

SPWorley
Maybe I should just compare those two solutions to each other considering that my patricia trie is basically done.
j66k
+2  A: 

PATL - C++ template library
Prefix search can be implemented as follows:

typedef patl::trie_set<std::string> trie_t;
trie_t trie;
/* ... fill trie with strings ... */
std::pair<trie_t::const_iterator, trie_t::const_iterator> =
  trie.equal_range("image", 2 * 8); // prefix size in bits = 2 chars = im*
uxn
A: 

It turns out the GNU extensions for the standard c++ lib includes a Patricia trie implementation. It's found under the policy-based data-structures extension. See http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

TG