+19  Q: 

Red-Black Trees

I've seen binary trees and binary searching mentioned in several books I've read lately, but as I'm still at the beginning of my studies in Computer Science, I've yet to take a class that's really dealt with algorithms and data structures in a serious way.

I've checked around the typical sources (Wikipedia, Google) and most descriptions of the usefulness and implementation of (in particular) Red-Black trees have come off as dense and difficult to understand. I'm sure for someone with the necessary background, it makes perfect sense, but at the moment it reads like a foreign language almost.

So what makes binary trees useful in some of the common tasks you find yourself doing while programming? Beyond that, which trees do you prefer to use (please include a sample implementation) and why?

+19  A: 

Red Black trees are good for creating well-balanced trees. The major problem with binary search trees is that you can make them unbalanced very easily. Imagine your first number is a 15. Then all the numbers after that are increasingly smaller than 15. You'll have a tree that is very heavy on the left side and has nothing on the right side.

Red Black trees solve that by forcing your tree to be balanced whenever you insert or delete. It accomplishes this through a series of rotations between ancestor nodes and child nodes. The algorithm is actually pretty straightforward, although it is a bit long. I'd suggest picking up the CLR (Cormen, Lieserson, Rivest and Stein) textbook, "Introduction to Algorithms" and reading up on RB Trees.

The implementation is also not really so short so it's probably not really best to include it here. Nevertheless, trees are used extensively for high performance apps that need access to lots of data. They provide a very efficient way of finding nodes, with a relatively small overhead of insertion/deletion. Again, I'd suggest looking at the CLR textbook to read up on how they're used.

While BSTs may not be used explicitly - one example of the use of trees in general are in almost every single modern RDBMS. Similarly, your file system is almost certainly represented as some sort of tree structure, and files are likewise indexed that way. Trees power Google. Trees power just about every website on the internet.

+4  A: 

Red Black Trees and B-trees are used in all sorts of persistent storage; because the trees are balanced the performance of breadth and depth traversals are mitigated.

Nearly all modern database systems use trees for data storage.

+2  A: 

BSTs make the world go round, as said by Micheal. If you're looking for a good tree to implement, take a look at AVL trees (Wikipedia). They have a balancing condition, so they are guaranteed to be O(logn). This kind of searching efficiency makes it logical to put into any kind of indexing process. The only thing that would be more efficient would be a hashing function, but those get ugly quick, fast, and in a hurry. Also, you run into the Birthday Paradox (also known as the pigeon-hole problem).

What textbook are you using? We used Data Structures and Analysis in Java by Mark Allen Weiss. I actually have it open in my lap as i'm typing this. It has a great section about Red-Black trees, and even includes the code necessary to implement all the trees it talks about.

+8  A: 

I'd like to address only the question "So what makes binary trees useful in some of the common tasks you find yourself doing while programming?"

This is a big topic that many people disagree on. Some say that the algorithms taught in a CS degree such as binary search trees and directed graphs are not used in day-to-day programming and are therefore irrelevant. Others disagree, saying that these algorithms and data structures are the foundation for all of our programming and it is essential to understand them, even if you never have to write one for yourself. This filters into conversations about good interviewing and hiring practices. For example, Steve Yegge has an article on interviewing at Google that addresses this question. Remember this debate; experienced people disagree.

In typical business programming you may not need to create binary trees or even trees very often at all. However, you will use many classes which internally operate using trees. Many of the core organization classes in every language use trees and hashes to store and access data.

If you are involved in more high-performance endeavors or situations that are somewhat outside the norm of business programming, you will find trees to be an immediate friend. As another poster said, trees are core data structures for databases and indexes of all kinds. They are useful in data mining and visualization, advanced graphics (2d and 3d), and a host of other computational problems.

I have used binary trees in the form of BSP (binary space partitioning) trees in 3d graphics. I am currently looking at trees again to sort large amounts of geocoded data and other data for information visualization in Flash/Flex applications. Whenever you are pushing the boundary of the hardware or you want to run on lower hardware specifications, understanding and selecting the best algorithm can make the difference between failure and success.

Jonathan Branam
For geocoded data check out R-Tree o Quad-Tree.
+1  A: 

The best description of red-black trees I have seen is the one in Cormen, Leisersen and Rivest's 'Introduction to Algorithms'. I could even understand it enough to partially implement one (insertion only). There are also quite a few applets such as This One on various web pages that animate the process and allow you to watch and step through a graphical representation of the algorithm building a tree structure.

CLR does have a pretty good description. I have an early version of the CLR though (took comp sci classes about 20 years ago) and unfortunately, they leave coding the "Delete" function as an exercise to the reader.

If you would like to see how a Red-Black tree is supposed to look graphically, I have coded an implementation of a Red-Black tree that you can download here

Brock Woolf
@Brock: Page Not Found 2010-08-22
+1  A: 

IME, almost no one understands the RB tree algorithm. People can repeat the rules back to you, but they don't understand why those rules and where they come from. I am no exception :-)

For this reason, I prefer the AVL algorithm, because it's easy to comprehend. Once you understand it, you can then code it up from scratch, because it make sense to you.

Blank Xavier

None of the answers mention what it is exactly BSTs are good for.

If what you want to do is just lookup by values then a hashtable is much faster, O(1) insert and lookup (amortized best case).

A BST will be O(log N) lookup where N is the height of the tree, inserts are also O(log N).

RB and AVL trees are important like another answer mentioned because of this property, if a plain BST is created with in-order values then the tree will be as high as the number of values inserted, this is bad for lookup performance.

The difference between RB and AVL trees are in the the rotations required to rebalance after an insert or delete, AVL trees are O(log N) for rebalances while RB trees are O(1). An example of benefit of this constant complexity is in a case where you might be keeping a persistent data source, if you need to track changes to roll-back you would have to track O(log N) possible changes with an AVL tree.

Why would you be willing to pay for the cost of a tree over a hash table? ORDER! Hash tables have no order, BSTs on the other hand are always naturally ordered by virtue of their structure. So if you find yourself throwing a bunch of data in an array or other container and then sorting it later, a BST may be a better solution.

The tree's order property gives you a number of ordered iteration capabilities, in-order, depth-first, breadth-first, pre-order, post-order. These iteration algorithms are useful in different circumstances if you want to look them up.

Red black trees are used internally in almost every ordered container of language libraries, C++ Set and Map, .NET SortedDictionary, Java TreeSet, etc...

So trees are very useful, and you may use them quite often without even knowing it. You most likely will never need to write one yourself, though I would highly recommend it as an interesting programming exercise.


Trees can be fast. If you have a million nodes in a balanced binary tree, it takes twenty comparisons on average to find any one item. If you have a million nodes in a linked list, it takes five hundred thousands comparisons on average to find the same item.

If the tree is unbalanced, though, it can be just as slow as a list, and also take more memory to store. Imagine a tree where most nodes have a right child, but no left child; it is a list, but you still have to hold memory space to put in the left node if one shows up.

Anyways, the AVL tree was the first balanced binary tree algorithm, and the Wikipedia article on it is pretty clear. The Wikipedia article on red-black trees is clear as mud, honestly.

Beyond binary trees, B-Trees are trees where each node can have many values. B-Tree is not a binary tree, just happens to be the name of it. They're really useful for utilizing memory efficiently; each node of the tree can be sized to fit in one block of memory, so that you're not (slowly) going and finding tons of different things in memory that was paged to disk. Here's a phenomenal example of the B-Tree.

Dean J