views:

223

answers:

6

What are the use cases when a particular sorting algorithm is preferred - merge sort vs quick sort vs heap sort vs introsort, etc? Is there a recommended guide in using them based on the size, type of data strucutre, available memory and cache, and CPU performance.

thanks,

Sam

+6  A: 

A set of animations for different kinds of data and algorithms can be found at sorting-algorithms.com

Chip Uni
Ok +1 because thats kinda cool.
GrayWizardx
This doesn't answer the question.
Ori Pessach
OK, maybe it does.
Ori Pessach
It does, indirectly. The animation gives a good sense of when each algorithm is going to perform best or worst.
Mathias
Yes, it does answer the question. Click on each algorithm to get a long description of it, when it's best, and its properties.
Chip Uni
A: 

Yes, every algorithm have an advantage and disadvantage depending on the problem you're trying to solve. ie. heapsort doesnt use any extra space which is great is you are working with very large lists.

Importance of algorithms

pcp
right but what are the recommendations?
sam
+1  A: 

The Wikipedia page on sorting algorithms has a great comparison chart.

http://en.wikipedia.org/wiki/Sorting%5Falgorithm#Comparison%5Fof%5Falgorithms

Dan Lorenc
+2  A: 

What the provided links to comparisons/animations do not consider is when the amount of data exceed available memory --- at which point the number of passes over the data, i.e. I/O-costs, dominate the runtime. If you need to do that, read up on "external sorting" which usually cover variants of merge- and heap sorts.

http://corte.si/posts/code/visualisingsorting/index.html and http://corte.si/posts/code/timsort/index.html also have some cool images comparing various sorting algorithms.

Alex Brasetvik
+2  A: 

Quicksort is usually the fastest on average, but it has some pretty nasty worst-case behaviors. So if you have to guarantee no bad data gives you O(N^2), you should avoid it.

Merge-sort uses extra memory, but is particularly suitable for external sorting (i.e. huge files that don't fit into memory).

Heap-sort can sort in-place and doesn't have the worst case quadratic behavior, but on average is slower than quicksort in most cases.

Where only integers in a restricted range are involved, you can use some kind of radix sort to make it very fast.

In 99% of the cases, you'll be fine with the library sorts, which are usually based on quicksort.

Eli Bendersky
+1: For "In 99% of the cases, you'll be fine with the library sorts, which are usually based on quicksort".
Jim G.
Randomized pivoting gives Quicksort a runtime of O(nlogn) for all practical purposes, without needing any guarantees about bad data. I really don't think anyone implements a O(n^2) quicksort for any production code.
MAK
Eli Bendersky
+5  A: 

First, a definition, since it's pretty important: A stable sort is one that's guaranteed not to reorder elements with identical keys.

Recommendations:

Quick sort: When you don't need a stable sort and average case performance matters more than worst case performance. A quick sort is O(N log N) on average, O(N^2) in the worst case. A good implementation uses O(log N) auxiliary storage in the form of stack space for recursion.

Merge sort: When you need a stable, O(N log N) sort, this is about your only option. The only downsides to it are that it uses O(N) auxiliary space and has a slightly larger constant than a quick sort. There are some in-place merge sorts, but AFAIK they are all either not stable or worse than O(N log N). Even the O(N log N) in place sorts have so much larger a constant than the plain old merge sort that they're more theoretical curiosities than useful algorithms.

Heap sort: When you don't need a stable sort and you care more about worst case performance than average case performance. It's guaranteed to be O(N log N), and uses O(1) auxiliary space, meaning that you won't unexpectedly run out of heap or stack space on very large inputs.

Introsort: This is a quick sort that switches to a heap sort after a certain recursion depth to get around quick sort's O(N^2) worst case. It's almost always better than a plain old quick sort, since you get the average case of a quick sort, with guaranteed O(N log N) performance. Probably the only reason to use a heap sort instead of this is in severely memory constrained systems where O(log N) stack space is practically significant.

Insertion sort: When N is guaranteed to be small, including as the base case of a quick sort or merge sort. While this is O(N^2), it has a very small constant and is a stable sort.

Bubble sort, selection sort: When you're doing something quick and dirty and for some reason you can't just use the standard library's sorting algorithm. The only advantage these have over insertion sort is being slightly easier to implement.


Non-comparison sorts: Under some fairly limited conditions it's possible to break the O(N log N) barrier and sort in O(N). Here are some cases where that's worth a try:

Counting sort: When you are sorting integers with a limited range.

Radix sort: When log(N) is significantly larger than K, where K is the number of radix digits.

Bucket sort: When you can guarantee that your input is approximately uniformly distributed.

dsimcha
As I recall, heap sort also has a very predictable running time in that there is little variation among different inputs of the same size, but that's of less interest than its constant space bound. I also find insertion sort the easiest to implement of the n^2 sorts, but maybe that's just me. Finally, you might also want to mention Shell sort, which is almost as simple to implement as insertion sort but has better performance, though still not n log n.
jk
Don't forget [Bogosort](http://en.wikipedia.org/wiki/Bogosort)! ;-)
Alex Brasetvik
+1 Very interesting. Would you care to explain how you can "guarantee ... approximately uniformly distributed." for Bucket Sort?
drspod
@drspod: You'd have to know something about the nature of your data and where it was coming from. This knowledge might come from the problem domain. There are no deep theoretical tricks to "guaranteeing your data is approximately uniformly distributed."
dsimcha
@dsimcha Do you just mean that the number of data within a range is bounded for all ranges?
drspod
It means that, within the range of values your data can take, all values have approximately equal probability. http://en.wikipedia.org/wiki/Uniform_distribution_%28discrete%29
dsimcha
I think saying Introsort is "almost always better than plain old quicksort" could be misleading. A Quicksort has the *possibility* of poor worst case performance, but it's *extremely* remote with a good implementation. Introsort is nearly *always* slower in return for eliminating what's normally a purely theoretical possibility. That's not to say it's a poor choice, only that you're trading slightly worse average performance in return for considerably better worst case performance.
Jerry Coffin
Why would introsort be substantially slower than quick sort? The only overhead is counting recursion depth, which should be negligible. It only switches after recursion is much deeper than it should be in a good quick sort case.
dsimcha
@dsimcha:The difference favoring quicksort is certainly quite small, but it's still there...
Jerry Coffin