views:

136

answers:

4

The worst-case running time of insertion on a red-black tree is O(lg n) and if I perform a in-order walk on the tree, I essentially visit each node, so the total worst-case runtime to print the sorted collection would be O(n lg n)

I am curious, why are red-black trees not preferred for sorting over quick sort (whose average-case running time is O(n lg n).

I see that maybe because red-black trees do not sort in-place, but I am not sure, so maybe someone could help.

+2  A: 

There are plenty of sorting algorithms which are worst case O(n log n) - for example, merge sort. The reason quicksort is preferred is because it is faster in practice, even though algorithmically it may not be as good as some other algorithms.

Often in-built sorts use a combination of various methods depending on the values of n.

but merge-sort does not sort in-place, and how is quick-sort faster in practice? (I never understood it, although I find it thrown at me every time I ask this question)
Vaibhav Bajpai
All sorting algorithms have their own hidden constants factors involved, and while perhaps you could find papers on why this is the case with a bit of searching, trying to decide which actually performs faster theoretically isn't that easy. In practice means exactly that - if you compare the sorting algorithms on real data, you will find that quicksort inevitably has a faster runtime.
Quicksort is not always faster. If you keep increasing the number of elements, an O(n log n) algorithm will inevitably beat an O(n^2) algorithm at some time. However for small n, the constant factors have a much stronger impact and the O(n^2) algorithm may actually be the faster solution. Consider for example 10000*n and 100*n^2. At first, 100*n^2 will yield smaller values, but at n=100 the linear function catches up and produces smaller values for all further n. The effect is the same for quicksort and for most "practical" n it is faster.
Gnafoo
@Gnafoo - The analysis in your example is correct, but in this case two algorithms with the *same* O(n log n) average time complexity are being compared. Average time complexity is usually what matters in most real-world applications; worst-case time-complexity is usually only relevant for those few scenarios with hard execution time bound requirements.
Joel Hoff
+1  A: 

Big-O time complexity measures do not usually take into account scalar factors, e.g., O(2n) and O(4n) are usually just reduced to O(n). Time complexity analysis is based on operational steps at an algorithmic level, not at a strict programming level, i.e., no source code or native machine instruction considerations.

Quicksort is generally faster than tree-based sorting since (1) the methods have the same algorithmic average time complexity, and (2) lookup and swapping operations require fewer program commands and data accesses when working with simple arrays than with red-black trees, even if the tree uses an underlying array-based implementation. Maintenance of the red-black tree constraints requires additional operational steps, data field value storage/access (node colors), etc than the simple array partition-exchange steps of a quicksort.

The net result is that red-black trees have higher scalar coefficients than quicksort does that are being obscured by the standard O(n log n) average time complexity analysis result.

Some other practical considerations related to machine architectures are briefly discussed in the Quicksort article on Wikipedia

Joel Hoff
+2  A: 

Knowing which sort algorithm performs better really depend on your data and situation.

If you are talking in general/practical terms,

Quicksort (the one where you select the pivot randomly/just pick one fixed, making worst case Omega(n^2)) might be better than Red-Black Trees because (not necessarily in order of importance)

  • Quicksort is in-place. The keeps your memory footprint low. Say this quicksort routine was part of a program which deals with a lot of data. If you kept using large amounts of memory, your OS could start swapping your process memory and trash your perf.

  • Quicksort memory accesses are localized. This plays well with the caching/swapping.

  • Quicksort can be easily parallelized (probably more relevant these days).

  • If you were to try and optimize binary tree sorting (using binary tree without balancing) by using an array instead, you will end up doing something like Quicksort!

  • Red-Black trees have memory overheads. You have to allocate nodes possibly multiple times, your memory requirements with trees is doubles/triple that using arrays.

  • After sorting, say you wanted the 1045th (say) element, you will need to maintain order statistics in your tree (extra memory cost because of this) and you will have O(logn) access time!

  • Red-black trees have overheads just to access the next element (pointer lookups)

  • Red-black trees do not play well with the cache and the pointer accesses could induce more swapping.

  • Rotation in red-black trees will increase the constant factor in the O(nlogn).

  • Perhaps the most important reason (but not valid if you have lib etc available), Quicksort is very simple to understand and implement. Even a school kid can understand it!

I would say you try to measure both implementations and see what happens!

Also, Bob Sedgewick did a thesis on quicksort! Might be worth reading.

Moron
Sedgewick also had a hand in the invention of red-black trees and left-leaning red-black trees.
James McNellis
A: 

Generally, representations of O(nlgn) algorithms can be expanded to A*nlgn + B where A and B are constants. There are many algorithmic proofs that show the coefficients for quicksort are smaller than those of other algorithms. That is in best-case (quick sort performs horribly on sorted data).

Anon