views:

318

answers:

4

I found that java.util.Arrays.sort(Object[]) use 2 kinds of sorting algorithms(in JDK 1.6).

pseudocode:

if(array.length<7)
   insertionSort(array);
else
   mergeSort(array);

Why does it need 2 kinds of sorting here? for efficiency?

+4  A: 

It's for speed. The overhead of mergeSort is high enough that for short arrays it would be slower than insertion sort.

DJClayworth
+1  A: 

It appears that they believe mergeSort(array) is slower for short arrays. Hopefully they actually tested that.

ChrisH
+21  A: 

It's important to note that an algorithm that is O(N log N) is not always faster in practice than an O(N^2) algorithm. It depends on the constants, and the range of N involved. (Remember that asymptotic notation measures relative growth rate, not absolute speed).

For small N, insertion sort in fact does beat merge sort. It's also faster for almost-sorted arrays.

Here's a quote:

Although it is one of the elementary sorting algorithms with O(N^2) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead).

For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

Here's another quote from Best sorting algorithm for nearly sorted lists paper:

straight insertion sort is best for small or very nearly sorted lists

What this means is that, in practice:

  • Some algorithm A1 with higher asymptotic upper bound may be preferable than another known algorithm A2 with lower asymptotic upper bound
  • Some hybrid algorithms may adapt different algorithms depending on the input size

Related questions


A numerical example

Let's consider these two functions:

  • f(x) = 2x^2; this function has a quadratic growth rate, i.e. "O(N^2)"
  • g(x) = 10x; this function has a linear growth rate, i.e. "O(N)"

Now let's plot the two functions together:

alt text
Source: WolframAlpha: plot 2x^2 and 10x for x from 0 to 10

Note that between x=0..5, f(x) <= g(x), but for any larger x, f(x) quickly outgrows g(x).

Analogously, if A1 is a quadratic algorithm with a low overhead, and A2 is a linear algorithm with a high overhead, for smaller input, A1 may be faster than A2.

Thus, you can, should you choose to do so, create a hybrid algorithm A3 which simply selects one of the two algorithms depending on the size of the input. Whether or not this is worth the effort depends on the actual parameters involved.

Many tests and comparisons of sorting algorithms have been made, and it was decided that because insertion sort beats merge sort for small arrays, it was worth it to implement both for Arrays.sort.

polygenelubricants
You may like to compare the graph above with some actual measurements I took between an insertion sort and Java's sort: http://www.javamex.com/tutorials/collections/sorting_java_algorithm_performance_2.shtml
Neil Coffey
In addition to this excellent analysis, note that there are _two different kinds_ of insertion sort commonly used--the regular one and "binary insertion sort" where you find the spot to insert into by binary search, and then shift everything to make space. On most processors these days, a swap is faster than a compare, and binary insertion sort reduces the number of compares. Thus, usually, under the hood you'll find a binary insertion sort.
Rex Kerr
+1  A: 

Quoted from: http://en.wikipedia.org/wiki/Insertion_sort

Some divide-and-conquer algorithms such as quicksort and mergesort sort by 
recursively dividing the list into smaller sublists which are then sorted. 
A useful optimization in practice for these algorithms is to use insertion 
sort for sorting small sublists, where insertion sort outperforms these more 
complex algorithms. The size of list for which insertion sort has the advantage 
varies by environment and implementation, but is typically between eight and 
twenty elements.
tszming