views:

2066

answers:

19

I was working on implementing a quicksort yesterday, and then I ran it, expecting a faster runtime than the Mergesort (which I had also implemented). I ran the two, and while the quicksort was faster for smaller data sets <100 elements (and I did verify that it works), the mergesort became the quicker algorithm fairly quickly. I had been taught that quicksort is almost always "quicker" than mergesort, and I understand that there is some debate on this topic, but I at least expected it to be closer than this. For data sets >10000 elements, the mergesort was over 4 times faster. Is this to be expected, or is there an error in my quicksort code?

mergesort:

public static void mergeSort(int[ ] e)
{
    if (e.length <= 1) return;
    int[] first = new int[e.length/2];
    int[] second = new int[e.length - first.length];
    System.arraycopy(e, 0, first, 0, first.length);
    System.arraycopy(e, first.length, second, 0, second.length);
    mergeSort(first);
    mergeSort(second);
    System.arraycopy(merge(first, second), 0, e, 0, e.length);
}

private static int[] merge(int[] first, int[] second) {
    int iFirst = 0;
    int iSecond = 0;
    int iCombined = 0;

    int[] combined = new int[first.length + second.length];
    while(iFirst < first.length && iSecond < second.length) {
        if (first[iFirst] > second[iSecond]) {
            combined[iCombined++] = second[iSecond++];
        }
        else combined[iCombined++] = first[iFirst++];
    }
    for(; iFirst < first.length; iFirst++) {
        combined[iCombined++] = first[iFirst];
    }
    for(; iSecond < second.length; iSecond++) {
        combined[iCombined++] = second[iSecond];
    }
    return combined;
}

quicksort:

public static void quicksort(int[] a, int first, int last) {
    if (first >= last) return;

    int partitionIndex = partition(a, first, last);
    quicksort(a, first, partitionIndex - 1);
    quicksort(a, partitionIndex + 1, last);
}

public static int partition(int[] x, int first, int last) {
    int left = first;
    int right = last;
    int pivot = x[first];
    int pivotIdx = first;

    while(left <= right) {
        while(left < x.length && x[left] <= pivot) left++;
        while(right >= 0 && x[right] > pivot) right--;
        if (left <= right) {
            int temp = x[left];
            x[left] = x[right];
            x[right] = temp;
        }
    }
    pivotIdx = right;
    x[first] = x[right];
    x[pivotIdx] = pivot;
    return pivotIdx;
}
+1  A: 

Previously discussed on SO: "Why is quicksort better than mergesort?"

~

yawmark
+2  A: 

Based on this wikipedia article your results are expected.

JoshBerke
A: 

I saw the other SO discussion, but that one claims that mergesort is slower than quicksort (which is what my peers claim) and the wikipedia article claims that merge makes fewer comparisons than quicksort, but I have always heard that quicksort is supposed to be blindingly fast, even though it has the same nlogn time complexity as mergesort. Additionally, my quicksort algorithm is an in-place quicksort, so would that have a negative effect on the sort's time?

John Murano
Please add comments and clarifications to your questions, don't add them ass a answer, as they don't really answer your question.
Joachim Sauer
A: 

I could imagine that by directly accessing the memory, using C for example, one can improve the performance of Quicksort more than it is possible with Mergesort.

Another reason is that Mergesort needs more memory because it's hard to implement it as an in-place sort.

And specifically for your implementation you could improve the choosing of the pivot, there are a lot of different algorithms to find a good pivot.

As can be seen on wikipedia, one can implement Quicksort in different ways.

Georg
A: 

So my runtime might be improved if I chose a pivot with median of three or a random pivot (for the purposes of this question, I am working with an array of unordered pseudorandom data).

Additionally, I don't take into account how slow recursion is, but that shouldn't matter, because both algorithms are recursive, and I have read on wikipedia that quicksort is (supposedly) faster because it has a shallower recursion depth than mergesort.

John Murano
No, recursion makes all the difference. Your quicksort recurses for subarrays of length 2 (and 3).
Stephan Eggermont
A: 

Were you datasets random enough? Were they partially sorted?

That might affect the speed of the sort...

Like for the QuickSort's partition(), you'd skip along if the numbers are in sorted order, until you find one that's not.

Calyth
A: 

It might depend on what sort of data you're sorting for the testing (already ordered lists, randomized, reverse sorted). Also, quicksort will probably be faster in general if you pick a random pivot instead of using the first element.

DShook
A: 

OK thanks for the help. Like I said above, the datasets I was using were randomly generated by Math.random(), etc... but I will try and implement it with a random pivot.

John Murano
+1  A: 

Merge sort's worst case is quicksort's average case, so if you don't have a good implementation, merge sort is going to be faster overall. Getting quicksort to work fast is about avoiding sub-average cases. Choose a better pivot (median-of-3 helps) and you'll see a difference.

Ray Hidayat
A: 

I've checked the data that goes into the algorithm, and it definitely does not fit any of quicksort's worst cases.

John Murano
+1  A: 

One of the advantages of quicksort for relatively small array sizes is just an artifact of hardware implementation.

On arrays, quicksort can be done in-place, meaning that you're reading from and writing to the same area of memory. Mergesort, on the other hand, typically requires allocating new buffers, meaning your memory access is more spread out. You can see both of these behaviors in your example implementations.

As a result, for relatively small datasets, quicksort is more likely to get cache hits and therefore just tends to run faster on most hardware.

Mergesort is still a pretty good solution for large data sets or other data structures, like linked lists, as your experiments confirm.

fastcall
A: 

Actually, I just ran them again, and mergesort starts to get faster in sets >30000 items, up until then quicksort is faster.

John Murano
+5  A: 

I actually just wrote a "linked-list comparative sort demo program" in C and arrived at a similar conclusion (that mergesort will beat quicksort for most uses), altho I have been told that quicksort is generally not used for linked lists anyway. I would note that the choice of pivot values is a monster factor -- my initial version used a random node as the pivot, and when I refined it a bit to take a mean of two (random) nodes, the exectution time for 1000000 records went from over 4 minutes to less than 10 seconds, putting it right on par with mergesort.

Mergesort and quicksort have the same big O best case (n*log(n)) and despite what people may try to claim, big O is really about iteration count and not comparison count. The biggest difference that can be produced between the two of them will always be to quicksort's detriment, and it involves lists that are already largely sorted or contain a large number of ties (when quicksort does better than mergesort, the difference will not be nearly so great). This is because ties or already sorted segments streamline straight through mergesort; when two split lists come back to be merged, if one list already contains all smaller values, all of the values on the left will be compared one at a time to the first element of the right, and then (since the returned lists have an internal order) no further comparisons need be done and the right is simply iterated onto the end. This is to say, the number of iterations will remain constant, but the number of comparisons is cut in half. If you are talking about actual time and are sorting strings, it's the comparisons that are expensive.

Ties and already sorted segments in quicksort can easily lead to unbalanced lists if the pivot value is not carefully determined, and the imbalanced lists (eg, one on the right, ten on the left) are what causes the slowdown. So, if you can get your quicksort to perform as well on an already sorted list as it does on a ramdomized list, you've got a good method for finding the pivot.

If you're interested, the demo program produces output like this:

[root~/C] ./a.out -1 3 
Using "", 0 records
Primary Criteria offset=128

Command (h for help, Q to quit): N
How many records? 4000000
New list is 562500.00 kb

Command (h for help, Q to quit): m

Mergesorting..............3999999 function calls
123539969 Iterations     Comparison calls: 82696100
Elapsed time: 0 min 9 sec


Command (h for help, Q to quit): S
Shuffled.

Command (h for help, Q to quit): q

Quicksorting..............4000000 function calls
190179315 Iterations     Comparison calls: 100817020
Elapsed time: 0 min 23 sec

Altho without the krazy kolors. The source code is here and there's some more stuff about it by me and some discussion about halfway down this page.

ps. neither sort requires extra memory with the linked list.

mk27
A: 

(1) There is an qsort algo, used by C qsort(), which doesn't require extra memory. This was most probably invented by Hoare. This makes qsort() fast in C.

(2) Randomizing the data before running qsort will almost always speed it up.

(3) selecting the median data for pivot may make it faster,

Vardhan Varma
A: 

My slower qsort algo is an in-place sort, meaning that it doesn't need any extra memory either.

John Murano
A: 

This is consistent with the analysis of the algorithms. Merge-sort is guaranteed O(nlogn) for any input and for every runtime. Quicksort is best-case O(nlogn) and average case O(nlogn), but worst-case O(n^2), so the average execution will be in between O(nlogn) and O(n^2).

Quicksort is the best general case algorithm because it has low overhead, so it has good speed for values of n up to about 10000 or so and still good runtime for arbitrarily astronomical values of n. Merge-sort has the unfortunate overhead of writing a stack frame, required by every recursive call. Thus, for low values of n it has an atrociously high c in RT = cnlogn and it is not the preferred general sorting method.

Edit: Software Monkey pointed out a contradiction: Quicksort averages O(nlogn) for random input, but O(n^2) worst case. So it actually is somewhat bound by the entropy of your data -- or you could choose the pivot randomly. I might still be off a bit though.

Overflown
Quicksort can't be both "average case O(nlogn)" and "average ... between O(nlogn) and O(n^2)".
Software Monkey
sorry average O(nlogn) for random input, but O(n^2) worst caseSo it actually is somewhat bound by the entropy
Overflown
A: 

For good performance of quicksort, it is important not to recurse all the way down to lists of length 1

You should consider sorting lists of 2, 3 and even 4 as nested ifs swapping if necessary. Let us know how the performance changes.

Thorbjørn Ravn Andersen
A: 

If you implement heap sort as the base sorting algorithm in quick sorts worst case scenario, you achieve a theta(n log n) algorithm.

If you don't need stable sorting, and don't sort a linked list, I think that would be the fastest you could go.

Merge sort

Yngve Sneen Lindal
+1  A: 

Mergesort is a lot slower for random array based data, as long as it fits in ram. This is the first time I see it debated.

  • qsort the shortest subarray first.
  • switch to insertion sort below 5-25 elements
  • do a normal pivot selection

Your qsort is very slow because it tries to partition and qsort arrays of length 2 and 3.

Stephan Eggermont
+1 For the switch to insertion sort, should give a nice improvement
Helper Method