tags:

views:

513

answers:

9

Sorting has been studied for decades, so surely the sorting algorithms provide by any programming platform (java, .NET, etc.) must be good by now, right? Is there any reason to override something like System.Collections.SortedList?

+7  A: 

Generally no.

However, you know your data better than the people who wrote those sorting algorithms. Perhaps you could come up with an algorithm that is better than a generic algorithm for your specific set of data.

BoltBait
It's also worth mentioning that you have to ask yourself, "Even if my algorithm is faster, is it perceptible? Is it worth risking the correctness of my program or increasing cost/development time for this incremental speedup?"
Dave Markle
Amen - by all means, focus on what actually makes a difference to your app's performance!
Kevin Day
In these cases, where the language supports it, you may benefit by simply providing a custom comparator, rather than completely reimplementing a standard sorting algorithm.
Rob
+1  A: 

Short answer; no, except for academic interest.

Rob
+14  A: 

There are absolutely times where your intimate understanding of your data can result in much, much more efficient sorting algorithms than any general purpose algorithm available. I shared an example of such a situation in another post at SO, but I'll share it hear just to provide a case-in-point:

Back in the days of COBOL, FORTRAN, etc... a developer working for a phone company had to take a relatively large chunk of data that consisted of active phone numbers (I believe it was in the New York City area), and sort that list. The original implementation used a heap sort (these were 7 digit phone numbers, and a lot of disk swapping was taking place during the sort, so heap sort made sense).

Eventually, the developer stumbled on a different approach: By realizing that one, and only one of each phone number could exist in his data set, he realized that he didn't have to store the actual phone numbers themselves in memory. Instead, he treated the entire 7 digit phone number space as a very long bit array (at 8 phone numbers per byte, 10 million phone numbers requires just over a meg to capture the entire space). He then did a single pass through his source data, and set the bit for each phone number he found to 1. He then did a final pass through the bit array looking for high bits and output the sorted list of phone numbers.

This new algorithm was much, much faster (at least 1000x faster) than the heap sort algorithm, and consumed about the same amount of memory.

I would say that, in this case, it absolutely made sense for the developer to develop his own sorting algorithm.

If your application is all about sorting, and you really know your problem space, then it's quite possible for you to come up with an application specific algorithm that beats any general purpose algorithm.

However, if sorting is an ancillary part of your application, or you are just implementing a general purpose algorithm, chances are very, very good that some extremely smart university types have already provided an algorithm that is better than anything you will be able to come up with. Quick Sort is really hard to beat if you can hold things in memory, and heap sort is quite effective for massive data set ordering (although I personally prefer to use B+Tree type implementations for the heap b/c they are tuned to disk paging performance).

Kevin Day
This is not his own sorting algorithm, it is just a different representation of data...
Tim
The point is the same, though. He wrote his own implementation of the sort (even if the algorithm wasn't his) rather than using Library.Sort().
Bob Somers
Programming Pearls, discusses this in more detail
Alex Gartrell
Sounds like a Radix sort. This is known to be very fast against that type of data.
staticsan
Tim - Take a list of phone numbers, return them sorted. Sounds like a sorting alg to me... Bob - thanks - that's my point entirely. staticsan - this is most definitely not a radix sort (see http://en.wikipedia.org/wiki/Radix_sort). It is a recognition of unique characteristics of the data set.
Kevin Day
basically it is a bucket sort. and the real value here is not the sort -it is how the data is stored. I stand by my comment
Tim
Tim - fair enough - my intent was to prove a point, not give a canonical example. I agree that this is a bucket sort. Choosing a data structure that is self sorting was a very good decision. Can't tell you how many novice to mid developers I see who use a sort to find the max value in an array...
Kevin Day
A: 

A few months ago the Coding Horror blog reported on some platform with an atrociously bad sorting algorithm. If you have to use that platform then you sure do want to implement your own instead.

Windows programmer
+2  A: 

Implementing you own sorting algorithm is akin to optimization and as Sir Charles Antony Richard Hoare said, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil".

David Segonds
A: 

The problem of general purpose sorting has been researched to hell and back, so worrying about that outside of academic interest is pointless. However, most sorting isn't done on generalized input, and often you can use properties of the data to increase the speed of your sorting.

A common example is the counting sort. It is proven that for general purpose comparison sorting, O(n lg n) is the best that we can ever hope to do.

However, suppose that we know the range that the values to be sorted are in a fixed range, say [a,b]. If we create an array of size b - a + 1 (defaulting everything to zero), we can linearly scan the array, using this array to store the count of each element - resulting in a linear time sort (on the range of the data) - breaking the n lg n bound, but only because we are exploiting a special property of our data. For more detail, see here.

So yes, it is useful to write your own sorting algorithms. Pay attention to what you are sorting, and you will sometimes be able to come up with remarkable improvements.

mdkess
+2  A: 

Certain libraries (such as Java's very own Collections.sort) implement a sort based on criteria that may or may not apply to you. For example, Collections.sort uses a merge sort for it's O(n log(n)) efficiency as well as the fact that it's an in-place sort. If two different elements have the same value, the first element in the original collection stays in front (good for multi-pass sorting to different criteria (first scan for date, then for name, the collection stays name (then date) sorted)) However, if you want slightly better constants or have a special data-set, it might make more sense to implement your own quick sort or radix sort specific exactly to what you want to do.

That said, all operations are fast on sufficiently small n

Alex Gartrell
A: 
  • You might want to multi-thread the sorting implementation.
  • You might need better performance characteristics than Quicksorts O(n log n), think bucketsort for example.
  • You might need a stable sort while the default algorithm uses quicksort. Especially for user interfaces you'll want to have the sorting order be consistent.
  • More efficient algorithms might be available for the data structures you're using.
  • You might need an iterative implementation of the default sorting algorithm because of stack overflows (eg. you're sorting large sets of data).

Ad infinitum.

Jasper Bekkers
A: 

If you have experience at implementing sorting algorithms and understand the way the data characteristics influence their performance, then you would already know the answer to your question. In other words, you would already know things like a QuickSort has pedestrian performance against an almost sorted list. :-) And that if you have your data in certain structures, some sorts of sorting are (almost) free. Etc.

Otherwise, no.

staticsan