views:

1289

answers:

12

I'm trying to create an unusual associative array implementation that is very space-efficient, and I need a sorting algorithm that meets all of the following:

  1. Stable (Does not change the relative ordering of elements with equal keys.)
  2. In-place or almost in-place (O(log n) stack is fine, but no O(n) space usage or heap allocations.
  3. O(n log n) time complexity.

Also note that the data structure to be sorted is an array.

It's easy to see that there's a basic algorithm that matches any 2 of these three (insertion sort matches 1 and 2, merge sort matches 1 and 3, heap sort matches 2 and 3), but I cannot for the life of me find anything that matches all three of these criteria.

+4  A: 

What about quicksort?

Exchange can do that too, might be more "stable" by your terms, but quicksort is faster.

davenpcj
The example given in http://en.wikipedia.org/wiki/Quicksort#Algorithm is stable, though not the most efficient version of qsort.
freespace
It's my understanding that variations of Quicksort can be made stable, or efficient, but not both at the same time.
cjm
+8  A: 

Merge sort can be written to be in-place I believe. That may be the best route.

jjnguy
http://comjnl.oxfordjournals.org/cgi/content/abstract/35/6/643 This is probably the algorithm you want.
Corey D
+2  A: 

Perhaps shell sort? If I recall my data structures course correctly, it tended to be stable, but it's worse case time is O(n log^2 n), although it performs O(n) on almost sorted data. It's based on insertion sort, so it sorts in place.

Ryan
So it's sometimes stable? I think that is the exact definition of unstable :)
leppie
Sometimes is different than usually :)
Ryan
+3  A: 

There's a list of sort algorithms on Wikipedia. It includes categorization by execution time, stability, and allocation.

Your best bet is probably going to be modifying an efficient unstable sort to be stable, thereby making it less efficient.

davenpcj
+7  A: 

Note: standard quicksort is not O(n log n) ! In the worst case, it can take up to O(n^2) time. The problem is that you might pivot on an element which is far from the median, so that your recursive calls are highly unbalanced.

There is a way to combat this, which is to carefully pick a median which is guaranteed, or at least very likely, to be close to the median. It is surprising that you can actually find the exact median in linear time, although in your case it sounds like you care about speed so I would not suggest this.

I think the most practical approach is to implement a stable quicksort (it's easy to keep stable) but use the median of 5 random values as the pivot at each step. This makes it highly unlikely that you'll have a slow sort, and is stable.

By the way, merge sort can be done in-place, although it's tricky to do both in-place and stable.

Tyler
Fundamentals of Algorithms pg 237 describes a way to make quicksort O(n log n) *except* if all elements are the same.It recursively picks the median to pivot on, returning the pivoted list which quicksort then recurses down.Having said that, I agree that the median of 5 is the best way to do it.
Michael Deardeuff
+1  A: 

There is a class of stable in-place merge algorithms, although they are complicated and linear with a rather high constant hidden in the O(n). To learn more, have a look at this article, and its bibliography.

Edit: the merge phase is linear, thus the mergesort is nlog_n.

Rafał Dowgird
+1  A: 

Don't worry too much about O(n log n) until you can demonstrate that it matters. If you can find an O(n^2) algorithm with a drastically lower constant, go for it!

The general worst-case scenario is not relevant if your data is highly constrained.

In short: Run some test.

phyzome
It's always worst case.
jjnguy
I agree with phyzome in general, big-O doesn't matter unless N has a decent chance of being large. However, what I'm trying to do is write a space-efficient associative array to fit large amounts of data in RAM, so the whole point is that N is huge.
dsimcha
+2  A: 

Because your elements are in an array (rather than, say, a linked list) you have some information about their original order available to you in the array indices themselves. You can take advantage of this by writing your sort and comparison functions to be aware of the indices:

function cmp( ar, idx1, idx2 )
{
   // first compare elements as usual
   rc = (ar[idx1]<ar[idx2]) ? -1 : ( (ar[idx1]>ar[idx2]) ? 1 : 0 );

   // if the elements are identical, then compare their positions
   if( rc != 0 )
      rc = (idx1<idx2) ? -1 : ((idx1>idx2) ? 1 : 0);

   return rc; 
}

This technique can be used to make any sort stable, as long as the sort ONLY performs element swaps. The indices of elements will change, but the relative order of identical elements will stay the same, so the sort remains robust. It won't work out of the box for a sort like heapsort because the original heapification "throws away" the relative ordering, though you might be able to adapt the idea to other sorts.

Eric
I was going to propose the same thing.
Konrad Rudolph
This won't work for all algorithms. A sort could compare a_1 with some b, causing it to get swapped relative to some a_2 between them. You may be able to use it for some, but you have a hefty proof obligation.
wnoise
+1  A: 

Quicksort can be made stable reasonably easy simply by having an sequence field added to each record, initializing it to the index before sorting and using it as the least significant part of the sort key.

This has a slightly adverse effect on the time taken but it doesn't affect the time complexity of the algorithm. It also has a minimal storage cost overhead for each record, but that rarely matters until you get very large numbers of records (and is mimimized with larger record sizes).

I've used this method with C's qsort() function to avoid writing my own. Each record has a 32-bit integer added and populated with the starting sequence number before calling qsort().

Then the comparison function checked the keys and the sequence (this guarantees there are no duplicate keys), turning the quicksort into a stable one. I recall that it still outperformed the inherently stable mergesort for the data sets I was using.

Your mileage may vary, so always remember: Measure, don't guess!

paxdiablo
A: 

Maybe I'm in a bit of a rut, but I like hand-coded merge sort. It's simple, stable, and well-behaved. The additional temporary storage it needs is only N*sizeof(int), which isn't too bad.

Mike Dunlavey
A: 

There's a nice list of sorting functions on wikipedia that can help you find whatever type of sorting function you're after.

For example, to address your specific question, it looks like an in-place merge sort is what you want.

However, you might also want to take a look at strand sort, it's got some very interesting properties.

ReaperUnreal
A: 

Quicksort can be made stable by doing it on a linked list. This costs n to pick random or median of 3 pivots but with a very small constant (list traversal).

By splitting the list and ensuring that the left list is sorted so same values go left and the right list is sorted so the same values go right, the sort will be implicity stable for no real extra cost. Also, since this deals with assignment rather than swapping, I think the speed might actually be slightly better than a quick sort on an array since there's only a single write.

So in conclusion, list up all your items and run quicksort on a list