views:

560

answers:

4

I have alot of different sorting algorithms which all have the following signature:

void <METHOD>_sort_ints(int * array, const unsigned int ARRAY_LENGTH);

Are there any testing suites for sorting which I could use for the purpose of making empirical comparisons?

+7  A: 

The definitive study of sorting is Bob Sedgewick's doctoral dissertation. But there's a lot of good information in his algorithms textbooks, and those are the first two places I would look for test suite and methodology. If you've had a recent course you'll know more than I do; last time I had a course, the best method was to use quicksort down to partitions of size 12, then run insertion sort on the entire array. But the answers change as quickly as the hardware.

Jon Bentley's Programming Perls books have some other info on sorting.

You can quickly whip up a test suite containing

  • Random integers

  • Sorted integers

  • Reverse sorted integers

  • Sorted integers, mildly perturbed

If memory serves, these are the most important cases for a sort algorithm.

If you're looking to sort arrays that won't fit in cache, you'll need to measure cache effects. valgrind is effective if slow.

Norman Ramsey
+3  A: 

This site shows the various sorting algorithms using four groups: http://www.sorting-algorithms.com/

In addition to the four group in Norman's answer you want to check the sorting algorithms with collection of numbers that have a few similarities in the numbers:

  • All integers are unique
  • The same integer in the whole collection
  • Few Unique Keys

Changing the number of elements in the collection is also a good practice check each algorithm with 1K, 1M, 1G etc. to see what are the memory implications of that algorithm.

Dror Helper
+10  A: 

This detailed discussion, as well as linking to a large number of related web pages you are likely to find useful, also describes a useful set of input data for testing sorting algorithms (see the linked page for reasons). Summarising:

  1. Completely randomly reshuffled array
  2. Already sorted array
  3. Already sorted in reverse order array
  4. Chainsaw array
  5. Array of identical elements
  6. Already sorted array with N permutations (with N from 0.1 to 10% of the size)
  7. Already sorted array in reverse order array with N permutations
  8. Data that have normal distribution with duplicate (or close) keys (for stable sorting only)
  9. Pseudorandom data (daily values of S&P500 or other index for a decade might be a good test set here; they are available from Yahoo.com )
ire_and_curses
+3  A: 

sortperf.py has a well selected suite of benchmark test cases and was used to support the essay found here and make timsort THE sort in Python lo that many years ago. Note that, at long last, Java may be moving to timsort too, thanks to Josh Block (see here), so I imagine they have written their own version of the benchmark test cases -- however, I can't easily find a reference to it. (timsort, a stable, adaptive, iterative natural mergesort variant, is especially suited to languages with reference-to-object semantics like Python and Java, where "data movement" is relatively cheap [[since all that's ever being moved is references aka pointers, not blobs of unbounded size;-)]], but comparisons can be relatively costly [[since there is no upper bound to the complexity of a comparison function -- but then this holds for any language where sorting may be customized via a custom comparison or key-extraction function]]).

Alex Martelli