ansaurus

Question

Answer 1

+2 A:

I don't see anything wrong with your approach. If you're looking for a spike solution, perhaps the action taken in case 4 is actually feasible for the first three cases, IE find the common prefix to k and k' and rebuild the node with that in mind. If it happens that the keys were prefixes of one-another, the resulting trie will still be correct, only the implementation did a bit more work than it really had to. but then again, without any code to look at it's hard to say if this works in your case.

TokenMacGuy 2009-06-07 01:21:00

Thanks for your fast reply. The 4th case would be if we insert "stackbattle" above: We would have to create a new node "ba" and put a new node "ttle" to the left and to the right the old subtrie rooted with "base" (now renamed to "se").Cases 1-3 are afaik fundamentely different. (In these cases never 2 new nodes have to be created.)

jacob 2009-06-07 01:29:35

Answer 2

+6 A:

At a glance, it sounds like you've implemented a Patricia Trie. This approach also is called path compression in some of the literature. There should be copies of that paper that aren't behind the ACM paywall, which will include an insertion algorithm.

There's also another compression method you may want to look at: level compression. The idea behind path compression is to replace strings of single child nodes with a single super node that has a "skip" count. The idea behind level compression is to replace full or nearly full subtrees with a super node with a "degree" count that says how many digits of the key the node decodes. There's also a 3rd approach called width compression, but I'm afraid my memory fails me and I couldn't find a description of it with quick googling.

Level compression can shorten the average path considerably, but insertion and removal algorithms get quite complicated as they need to manage the trie nodes as similarly to dynamic arrays. For the right data sets, level compressed trees can be fast. From what I remember, they're the 2nd fastest approach for storing IP routing tables, the fastest is some sort of hash trie.

Jason Watkins 2009-06-07 02:09:25

There are some implementations of Patricia tries at the National Institute of Standards and Technology web site (http://www.itl.nist.gov/div897/sqg/dads/HTML/patriciatree.html)

Kathy Van Stone 2009-06-07 02:19:11

Thanks Jason for the reference and advice! Hashing might also be a good technique when it gets dense. But lets keep it simple with respect to insertions :)

jacob 2009-06-07 03:01:53

Thanks Kathy for the link.

jacob 2009-06-07 03:02:12

Answer 3

+2 A:

Somewhat of a tangent, but if you are super worried about the number of nodes in your Trie, you may look at joining your word suffixes too. I'd take a look at the DAWG (Directed Acyclic Word Graph) idea: http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

The downside of these is that they aren't very dynamic and creating them can be difficult. But, if your dictionary is static, they can be super compact.

Joe Beda 2009-06-07 05:33:28

Answer 4

+2 A:

I have a question regarding your implementation. What is the level of granularity that you decide to split your strings on to make the prefix tree. You could split stack as either s,t,a,c,k or st,ta,ac,ck and many other ngrams of it. Most prefix tree implementations take into account an alphabet for the language, based on this alphabet, you do the splitting.

If you were building a prefix tree implementation for python then your alphabets would be things like def, : , if , else... etc

Choosing the right alphabet makes a huge difference in building efficient prefix trees. As for your answers, you could look for PERL packages on CPAN which do longest common substring computation using trie's. You may have some luck there as most of their implementation is pretty robust.

Ritesh M Nayak 2009-06-07 05:46:21

I'm using no fixed alphabet, as to allow all strings. I use a hash-table to determine if a link already exists are not.

jacob 2009-06-07 14:19:27

Answer 5

+1 A:

Look at : Judy-arrays and the python interface at http://www.dalkescientific.com/Python/PyJudy.html

2009-06-13 08:05:25

great reference, thanks!

jacob 2009-06-14 10:50:42

ansaurus

tags:

views:

answers:

Trie (Prefix Tree) in Python

related questions