I want to learn more about "trie" data structures. I've read about them on Wikipedia but I'm still confused.
What is the most common use? Auto-complete in the address bar of a web browser... or?
Cheers!
I want to learn more about "trie" data structures. I've read about them on Wikipedia but I'm still confused.
What is the most common use? Auto-complete in the address bar of a web browser... or?
Cheers!
I've used it to find popular prefixes and suffixes (when applied to reversed strings) to remove redundant parts of document titles in a collection.
You can also use it to easily create regular expression that matches given set of strings foo,bar,baz
→ (foo|ba(r|z))
Don't have a source, but I heard they are used in the T9 text prediction on many cell phones. Allows for easy additions of new words.
Mine has been permanently scarred with some typos.
A Trie (I prefer to call it a prefix tree), could be a good data structure for building a memory-efficient dictionary with fast lookups, and yes, autocompletion.
Think of it as a hashtable, providing fast lookup of key-value pairs (or just lookup of keys), but unlike a hashtable it allows you to iterate over the keys in sorted order.
I've been using modified trie's in my Internet Filter product since 1995. Probably most internet filters on the market utilize something like it. The advantage is that the time to find a match is related to the length of the match, not the number of items in the data structure. Also see my article: Fast Tree
--jeffk++
Another use is the Aho-Corasick string searching algorithm which is based on a trie.
You will find trie structures at the heart of many bioinformatic applications ( DNA sequence alignment, etc ). See: Trie-based data structures for sequence assembly (pdf)
The trie is also at the heart of one of the fastest known sorting algorithms: Burstsort. The implementation involves building a trie and then traversing it depth-first. It's speed is attained from it's cache-efficient properties. See: Efficient Trie-Based Sorting of Large Sets of Strings (pdf)
It's worth noting that Hadoop recently set a sort benchmark record. It's implementation uses trie structures: Terasort Source Code
Clojure's wicked-cool immutable Map and Vector types are implemented using a trie under the covers. Consider, for example the implementation of PersistentHashMap.
The reason I say they ware "wicked cool" is because they are immutable, providing great advantages for concurrent programming. (You don't need to synchronized access to something that can't change.) At the same time, you can generate new altered copies of a collection very efficiently because they don't actually copy, they share structure with the source collection.
Can someone point me to an efficient implementation of Trie data structure.