Where can I get a decent high-level description of the algorithm used in __merge_without_buffer()
in the C++ STL? I'm trying to reimplement this code in the D programming language, with some enhancements. I can't seem to grok what it's doing at the algorithmic level from just reading the STL source code because there are too many low-level details obscuring it. Also, I don't want to just blindly translate the code because then, if it didn't work I wouldn't know why, and I wouldn't be able to add my enhancements.
views:
186answers:
1__merge_without_buffer()
is performing an in-place merge, as the merge step of an in-place merge sort. It takes as input two ranges of data [first, middle)
and [middle, last)
which are assumed to already be sorted. The len1
and len2
parameters are equal to the lengths of the two input ranges, namely (middle - first)
and (last - middle)
respectively.
First, it picks a pivot element. Then, it rearranges the data into the order A1 B1 A2 B2
, where A1
is the set of elements in [first, middle)
that are less than the pivot, A2
is the set of elements in [first, middle)
greater than or equal to the pivot, B1
is the set of elements in [middle, last)
less than the pivot, and B2
is the set of elements in [middle, last)
greater than or equal to the pivot. Note that the data is originally in the order A1 A2 B1 B2
, so all we need to do is to turn A2 B1
into B1 A2
. This is with a call to std::rotate()
, which does just that.
Now we've separated out the elements which are less than the pivot (A1
and B1
) from those that are greater than or equal to the pivot (A2
and B2
), so now we can recursively merge the two subranges A1 A2
and B1 B2
.
How do we choose a pivot? In the implementation I'm looking at, it chooses the median element from the larger subrange (i.e. if [first, middle)
has more elements than [middle, last)
, it chooses the median of [first, middle)
; otherwise, it chooses the median of [middle, last)
). Since the subranges are already sorted, choosing the median is trivial. This pivot choice ensures that when recursively merging the two subranges, each subproblem is no more than 3/4 the size of the current problem, because in the worst case, at least 1/4 of the elements are larger than or smaller than the pivot.
What is the running time of this? The std::rotate()
call takes O(N) time, and we make two recursive calls to ourselves. This equates to a running time of O(N log N). However, note that this is only one step in mergesort: remember that in mergesort you first recursively sort both halves and then merge. Thus, the recurrence relation for the running time of mergesort is now:
T(N) = 2T(N/2) + O(N log N)
Plug this into the Master theorem, and you get that mergesort now runs in O(N log2 N) time!
As an interesting final point, consider the following three qualities of a comparison-based sorting algorithm:
- In-place
- Stable
- Runs in O(N log N) time
You can usually only get 2 of these at once - quicksort gets you (1) and (3), mergesort gets you (2) and (3), and an in-place mergesort gets you (1) and (2). Non-comparison-based sorts such as count sort can achieve all 3, but those sorts can only sort certain data types. It's possible there exists a comparison-based sort which achieves all 3, but if there is, I'm not aware of its existence, and it's almost certainly much more complicated.