views:

539

answers:

4

If I have a large array of integers or floats, what is a good algorithm/ implementation for sorting (in C)?

It is a bit late in the game for an edit... but I am looking for correctness and speed.

+4  A: 

qsort() from the standard library is a good'un.

The comparison functions would be trivial for these cases:

int cmp_int(const void *a, const void *b)
{
    const int *ia = a;
    const int *ib = b;

    if (*ia < *ib)
        return -1;

    if (*ia > *ib)
        return 1;

    return 0;
}

int cmp_float(const void *a, const void *b)
{
    const float *fa = a;
    const float *fb = b;

    if (*fa < *fb)
        return -1;

    if (*fa > *fb)
        return 1;

    return 0;
}

(EDIT: The version of these based on subtracting b from a relies on signed overflow behaviour, so it's not a good idea.)

caf
Very handy and general purpose... but not especially fast; the comparison operation is not inlined. I wonder if the java quicksort (which I think can use inlineable comparisons) might be more competitive than glibc qsort().On the other hand, for inlined comparisons, the usort library suggested in my solution includes a general purpose introsort with inlined comparisons. It is general purpose in the sense that the comparisons are defined by macros which the #includer defines.
SetJmp
@ setjmp... then inline his solution. It is not hard to inline a function in C
Polaris878
Function pointer arguments can not get inlined in C; the compiler can not be sure what value the pointer will have at run time. At least, that is what I found last time I investigated.
SetJmp
In principle, there's no reason why the C compiler couldn't notice that a particular call to qsort always passes a constant function pointer, and behind the scenes create a magic inlined version of qsort(). I doubt there's any compilers that actually do this though, mostly because it's unlikely that the juice would be worth the squeeze.
caf
In any case, the overhead of non-inlined comparison is noted in the quicksort literature... there is a paper by Bentely and McGilroy "Engineering a Sort Function" that documents this. The USort library also contains test code to compare introsort with inlined comparisons vs GLIBC qsort (using random doubles as input):{{N introsort (secs) GLIBC (secs) x-fold speedup10000000 1.6881 2.837 1.68}}I believe the advantage of the introsort comes primarily from inlining (the introsort replaced quicksort in my code only recently).
SetJmp
There should be a line break after "speedup"... I assumed there would be opportunity to edit after posting to make the markup look good.
SetJmp
A better test would be using the same sorting implementation, just changing it from inlined comparison to indirected comparison. Then do it again with a more complicated comparison - like sorting { X , Y , Z } cartesian points by their distance from the origin.
caf
Agreed. And I think this is what the Bentley/McGilroy paper does in one of their experments. My introsort/quicksort implementation differs from the GLIBC qsort implementation in the use of textbook recursion (my version) instead of a very complex iterative representation (GLIBC).
SetJmp
A: 

It's never a bad idea to use qsort... unless you know something about the numbers.

You've tagged with radix sort. How much memory are you prepared to invest? Are the numbers inside a specific range? Do they have properties that makes radix sorting feasible?

Unless you want to use a lot of memory, qsort is a great option.

Dave Gamble
I have found that as long as the size of the array is big enough, radix sort will soundly beat qsort. For those small cases, dispatching to comparison-based sorting (with static comparisons) ensures uniform perform gains against qsort. This is the strategy of the usort implementation. As you suggest... there is memory usage: double the original array size. However, with 64 bit machines with enough virtual RAM, this will not be a problem.
SetJmp
A backtrack... for highly repetitive or presorted data... a comparison-based sorting data may be the better choice.
SetJmp
How do they perform respectively when the array is close to L2 cache size?
caf
It depends on the size of the type being sorted. For signed 8 byte integes, the difference is is the 2.5-3x range.
SetJmp
+1  A: 

For sorting arrays of numbers, consider a radix sort algorithm. When properly engineered, these sorts should provide better performance than GLIBC qsort().

The usort library contains type-specific implementations for all the major C numeric types.

http://bitbucket.org/ais/usort/wiki/Home

The speed advantage of radix sort over GLIBC qsort is about 2.5x for double precision floating point numbers at N=1000000 on my 64 bit intel laptop. However, as N grows the advantage should be even greater since radix sort is a linear time algorithm requiring constant number of passes through the data.

For very small N, the same code dispatches to an introsort or insertion sort.

SetJmp
Radix sort only works if you have a dense, ideally contiguous, list of numbers to sort. It works great if you have, say, a shuffled deck of cards; not so well if you have a scattered range of numbers with possible duplicates.
John Kugelman
Hi John - I don't understand your comment. The radix sort time is pretty deterministic compared to quicksort/introsort which will vary depending on statistics of the data. -A
SetJmp
Also, I think your examples are reversed. For shuffled deck I think it is quicksort/introsort that will be faster.
SetJmp
Ach... I am confusing myself now. It is for sorted deck I would expect quicksort to be faster. But radix sort should exhibit the less variance in sort time.
SetJmp
caf
@caf Thanks for pointing that out. As I re-look at the code I think you might be right. Unfortunately, I don't have a big endian platform around to test it out on.
SetJmp
My mistake. What I've always called "radix sorting" is in fact "pigeonhole sorting" (http://en.wikipedia.org/wiki/Pigeonhole_sort), an algorithm which works *fantastically* for decks of cards.
John Kugelman
A: 

Given a huge amount of RAM we're getting nowadays, the following set sort is possible: mark the bit in a huge RAM bit array for each number you have, then read them off back by scanning the RAM. Lots of hardware-specific optimizations can be applied for the mark and scan phases.

Alexy
The usort library uses a bucket sort for sorting single byte (signed and unsighed). This is similar to this idea. Effectively, sorting bytes can be done at a speed similar to one half of memory bandwidth.
SetJmp