views:

849

answers:

2

The implementation below is stable as it used <= instead of < at line marked XXX. This also makes it more efficient. Is there any reason to use < and not <= at this line?

/**
class for In place MergeSort
**/
class MergeSortAlgorithm extends SortAlgorithm {
    void sort(int a[], int lo0, int hi0) throws Exception {
    int lo = lo0;
    int hi = hi0;
    pause(lo, hi);
    if (lo >= hi) {
        return;
    }
    int mid = (lo + hi) / 2;

        /*
         *  Partition the list into two lists and sort them recursively
         */
        sort(a, lo, mid);
        sort(a, mid + 1, hi);

        /*
         *  Merge the two sorted lists
         */
    int end_lo = mid;
        int start_hi = mid + 1;
    while ((lo <= end_lo) && (start_hi <= hi)) {
            pause(lo);
        if (stopRequested) {
                return;
            }
            if (a[lo] <= a[start_hi]) {                   // LINE XXX
                lo++;
            } else {
                /*  
                 *  a[lo] >= a[start_hi]
                 *  The next element comes from the second list, 
                 *  move the a[start_hi] element into the next 
                 *  position and shuffle all the other elements up.
                 */
     int T = a[start_hi];
                for (int k = start_hi - 1; k >= lo; k--) {
                    a[k+1] = a[k];
                    pause(lo);
                }
                a[lo] = T;
                lo++;
                end_lo++;
                start_hi++;
            }
        }
    }

    void sort(int a[])  throws Exception {
    sort(a, 0, a.length-1);
    }
}
+4  A: 

Because the <= in your code assures that same-valued elements (in left- and right-half of sorting array) won't be exchanged. And also, it avoids useless exchanges.

if (a[lo] <= a[start_hi]) {
 /* The left value is smaller than or equal to the right one, leave them as is. */
 /* Especially, if the values are same, they won't be exchanged. */
 lo++;
} else {
 /*
  * If the value in right-half is greater than that in left-half,
  * insert the right one into just before the left one, i.e., they're exchanged.
  */
 ...
}

Assume that same-valued element (e.g., ‘5’) in both-halves and the operator above is <. As comments above shows, the right ‘5’ will be inserted before the left ‘5’, in other words, same-valued elements will be exchanged. This means the sort is not stable. And also, it's inefficient to exchange same-valued elements.


I guess the cause of inefficiency comes from the algorithm itself. Your merging stage is implemented using insertion sort (as you know, it's O(n^2)).

You may have to re-implement when you sort huge arrays.

habe
+1 Yes, the merge can actually be done in O(n), only on small arrays (less than 7 elements for example), an insertion sort outperforms mergeSort because of it's small constant factor.
Helper Method
A: 

So why is in place merge sort considered to be unstable and merge sort not? The above implementation is stable..

I guess they assume that “merge sort have to be O(n log n).”Your algorithm is stable and merges subarrays, but it's O(n^2) (or O(n^2 log n)?).So, it may be hard to said to be merge sort as technical term in computer science.I think the key is how to insert a smaller element into the front of larger elements with O(1) cost.A natural (and your) way costs O(n), and I guess there's no O(1) way.To avoid the insertion cost, non-inplace algorithm appends smaller element to another array (this is O(1)) instead of insertion.
habe
that makes sense. But if the data structure used is a linked list, then in place merge sort would be stable as sorting can take in place in O(1) time.
I agree.It's stable and stable merge sort.But for the linked-list algorithm, I think there is no difference between in-place or not.
habe
So what does an implementation of O(n logn) in-place merge sort looks like for arrays? I assume it to be very complex but can someone point me to it. I want to review it.