views:

125

answers:

5

What is the fastest way to sort the values in a smooth 2D array?

The input is a small filtered image:

  • about 60 by 80 pixels
  • single channel
  • single or double precision float
  • row major storage, sequential in memory
  • values have mixed sign
  • piecewise "smooth", with regions on the order of 10 pixels wide

Output is a flat (about 4800 value) array of the sorted values, along with the indices that sort the original array.

A: 

There is timsort, but I have seen in several places that it is meant for applications with slow compares; the numpy devs apparently decided not to even bother implementing it:

http://mail.scipy.org/pipermail/scipy-dev/2009-May/011929.html

Drew Wagner
A: 

One could mergesort the rows individually, and then merge the sorted rows.

That would at least leverage some of the special structure of the 2D array, i.e. the fact that monotonic runs will typically start and stop at the edge of the array. It also exposes another couple levels of parallelism.

Drew Wagner
+1  A: 

I'd start with in-place quicksort. Floating point comparisons are fast on most processors (certainly a lot faster than the allocation needed for a mergesort).

Robert Fraser
+1  A: 

I hammered out a quick and dirty benchmark on some images using numpy's sort routines on the flat array. This is averaged over a few hundred random images and a few hundred images of human faces. Both are single precision.

On random images...
quicksort took 0.000153 seconds per image.
mergesort took 0.000170 seconds per image.
heapsort took 0.000241 seconds per image.
On real images...
quicksort took 0.000136 seconds per image.
mergesort took 0.000143 seconds per image.
heapsort took 0.000230 seconds per image.

All of the algorithms seem to benefit from the existing partial ordering, especially quicksort. Numpy doesn't seem to have a sorted list merge function, so I can't try pre-sorting the rows, alas.

Drew Wagner
+2  A: 

I'd expect Timsort to win this since it takes advantage of "runs" in data.

Quicksort will normally be be fast but there is a risk that you hit a worst-case scenario. e.g. some versions of quickshort are O(n^2) when given already sorted input. Which would not be very friendly if someone gave you the wrong type of gradient-filled image......

Here's a slightly crazy idea - you might also try a Z-ordering pass (Wikipedia link) which could enable you to take advantage of adjacent similar colours in both dimensions.

mikera
The Z-ordering is a neat idea... There are still "good" and "bad" directions, but I think in all cases it will help improve the global ordering, even if the local ordering can be arbitrarily bad. Thanks!
Drew Wagner
No probs. I've found Z-ordering to be oddly useful in lots of strange situations, particularly when any kind of 2+ dimensional data is involved.
mikera