STL __merge_without_buffer algorithm?

__merge_without_buffer() is performing an in-place merge, as the merge step of an in-place merge sort. It takes as input two ranges of data [first, middle) and [middle, last) which are assumed to already be sorted. The len1 and len2 parameters are equal to the lengths of the two input ranges, namely (middle - first) and (last - middle) respectively.

First, it picks a pivot element. Then, it rearranges the data into the order A1 B1 A2 B2, where A1 is the set of elements in [first, middle) that are less than the pivot, A2 is the set of elements in [first, middle) greater than or equal to the pivot, B1 is the set of elements in [middle, last) less than the pivot, and B2 is the set of elements in [middle, last) greater than or equal to the pivot. Note that the data is originally in the order A1 A2 B1 B2, so all we need to do is to turn A2 B1 into B1 A2. This is with a call to std::rotate(), which does just that.

Now we've separated out the elements which are less than the pivot (A1 and B1) from those that are greater than or equal to the pivot (A2 and B2), so now we can recursively merge the two subranges A1 A2 and B1 B2.

How do we choose a pivot? In the implementation I'm looking at, it chooses the median element from the larger subrange (i.e. if [first, middle) has more elements than [middle, last), it chooses the median of [first, middle); otherwise, it chooses the median of [middle, last)). Since the subranges are already sorted, choosing the median is trivial. This pivot choice ensures that when recursively merging the two subranges, each subproblem is no more than 3/4 the size of the current problem, because in the worst case, at least 1/4 of the elements are larger than or smaller than the pivot.

What is the running time of this? The std::rotate() call takes O(N) time, and we make two recursive calls to ourselves. This equates to a running time of O(N log N). However, note that this is only one step in mergesort: remember that in mergesort you first recursively sort both halves and then merge. Thus, the recurrence relation for the running time of mergesort is now:

T(N) = 2T(N/2) + O(N log N)

Plug this into the Master theorem, and you get that mergesort now runs in O(N log² N) time!

As an interesting final point, consider the following three qualities of a comparison-based sorting algorithm:

In-place
Stable
Runs in O(N log N) time

You can usually only get 2 of these at once - quicksort gets you (1) and (3), mergesort gets you (2) and (3), and an in-place mergesort gets you (1) and (2). Non-comparison-based sorts such as count sort can achieve all 3, but those sorts can only sort certain data types. It's possible there exists a comparison-based sort which achieves all 3, but if there is, I'm not aware of its existence, and it's almost certainly much more complicated.

Does O(N log2 N) mean N (log N)^2? I was confused, because I thought log2 meant log_2, but the base is irrelevant here. Also, is the last part "you can only get 2 of these at once" strictly true, or is it simply that the algorithms that get all three are complicated? My guess is the latter?

A. Rex 2009-01-13 05:42:02

I've edited to address your questions. I messed up the HTML superscript tag. log^2 N means (log N)^2. Although base 2 is implied, the actual base is irrelevant, since it's a constant factor that gets eaten up by the big-O notation.

Adam Rosenfield 2009-01-13 05:51:14

In paragraph 3, I think you mean that subranges A1 and B1 get merged together recursively, and subranges A2 and B2 get merged recursively, right? After rotation, there is no subrange A1 A2 anymore.

Rob Kennedy 2009-01-13 05:56:36

Thanks, but now that I know it's O(N log N) not O(N) I'm not even sure I want to bother implementing it anymore, though I think it's a really cool algorithm, so I might just to see how slow it is in practice.

dsimcha 2009-01-13 14:51:11

Note that it's only called when you're low on memory. When callers can allocate a temporary buffer, they don't call the in-place version. They call the O(n) "adaptive" version instead. When it's between "slow" and failure, choose the former.

Rob Kennedy 2009-01-13 15:05:23

The standard quicksort is not in-place, it requires non-constant additional stack. There is a modified quicksort that runs in constant additional memory (storing its stack in the array using clever tricks), but it is rather ineffective, although still O(n*log(n)).

Rafał Dowgird 2009-01-13 16:52:15

True, but for most real world uses, O(log N) stack can be considered "almost" in place. It generates no heap activity and has minimal risk of overflowing the stack.

dsimcha 2009-01-13 18:35:31

I did end up implementing this to satisfy my curiosity, and it's not as slow as I thought. Thanks.

dsimcha 2009-01-14 04:53:57

Incidentally, I messaged Adam Rosenfield off-site, and thought some of you would like to know. From the standpoint of theory, there do exist algorithms that have "all good three qualities". See http://portal.acm.org/citation.cfm?id=892042 and http://www.springerlink.com/content/bx6d9xv8p6w3d3p4/

A. Rex 2009-01-14 20:00:56

Yes, I've looked at these a little. They seem incredibly hard to understand and implement, though I'd like to get one working. I'd assume there's a reason why they're not in STL.

dsimcha 2009-01-15 00:45:05

ansaurus

tags:

views:

answers:

STL __merge_without_buffer algorithm?

related questions