A sort is said to be stable if it maintains the relative order of elements with equal keys. I guess my question is really, what is the benefit of maintaining this relative order? Can someone give an example? Thanks.
Not all sorting is based upon the entire value. Consider a list of people. I may only want to sort them by their names, rather than all of their information. With a stable sorting algorithm, I know that if I have two people named "John Smith", then their relative order is going to be preserved.
Last First Phone
-----------------------------
Wilson Peter 555-1212
Smith John 123-4567
Smith John 012-3456
Adams Gabriel 533-5574
Since the two "John Smith"s are already "sorted" (they're in the order I want them), I won't want them to change positions. If I sort these items by last, then first with an unstable sorting algorithm, I could end up either with this:
Last First Phone
-----------------------------
Adams Gabriel 533-5574
Smith John 123-4567
Smith John 012-3456
Wilson Peter 555-1212
Which is what I want, or I could end up with this:
Last First Phone
-----------------------------
Adams Gabriel 533-5574
Smith John 012-3456
Smith John 123-4567
Wilson Peter 555-1212
(You see the two "John Smith"s have switched places). This is NOT what I want.
If I used a stable sorting algorithm, I would be guaranteed to get the first option, which is what I'm after.
A priority queue is an example of this.
(3, "bill")
(1, "bob")
(1, "jane")
If you sort this from smallest to largest number, an unstable sort might do this.
(1, "jane")
(1, "bob")
(3, "bill")
But then "jane" got ahead of "bob" even though it was supposed to be the other way around.
Generally, they are useful for sorting multiple entries in multiple steps.
A sorting algorithm is stable if it preserves the order of duplicate keys.
OK, fine, but why should this be important? Well, the question of "stability" in a sorting algorithm arises when we wish to sort the same data more than once according to different keys.
Sometimes data items have multiple keys. For example, perhaps a (unique) primary key such as a social insurance number, or a student identification number, and one or more secondary keys, such as city of residence, or lab section. And we may very well want to sort such data according to more than one of the keys. The trouble is, if we sort the same data according to one key, and then according to a second key, the second key may destroy the ordering achieved by the first sort. But this will not happen if our second sort is a stable sort.
One case is when you want to sort by multiple keys. For example, to sort a list of first name / surname pairs, you might sort first by the first name, and then by the surname.
If your sort was not stable, then you would lose the benefit of the first sort.
It means if you want to sort by Album, AND by Track Number, that you can click Track number first, and it's sorted - then click Album Name, and the track numbers remain in the correct order for each album.
It enables your sort to 'chain' through multiple conditions.
Say you have a table with first and last names in random order. If you sort by first name, and then by last name, the stable sorting algorithm will ensure people with the same last name are sorted by first name.
For example:
- Smith, Alfred
- Smith, Zed
Will be guaranteed to be in the correct order.
An example:
Say you have a data structure that contains pairs of phone numbers and employees who called them. A number/employee record is added after each call. Some phone numbers may be called by several different employees.
Furthermore, say you want to sort the list by phone number and give a bonus to the first 2 people who called any given number.
If you sort with an unstable algorithm, you may not preserve the order of callers of a given number, and the wrong employees could be given the bonus.
A stable algorithm makes sure that the right 2 employees per phone number get the bonus.
The advantage of stable sorting for multiple keys is dubious, you can always use a comparison that compares all the keys at once. It's only an advantage if you're sorting one field at a time, as when clicking on a column heading - Joe Koberg gives a good example.
Any sort can be turned into a stable sort if you can afford to add a sequence number to the record, and use it as a tie-breaker when presented with equivalent keys.
The biggest advantage comes when the original order has some meaning in and of itself. I couldn't come up with a good example, but I see JeffH did so while I was thinking about it.
Let's say you are sorting on an input set which has two fields, and, you only sort on the first. The '|' character divides the fields.
In the input set you have many entries, but, you have 3 entries that look like
. . . AAA|towing . . . AAA|car rental . . . AAA|plumbing . . .
Now, when you get done sorting you expect all the fields with AAA in them to be together.
A stable sort will give you: . . . AAA|towing AAA|car rental AAA|plumbing . . .
ie, the three records which had the same sort key, AAA, are in the same order in the output that they were in the input. Note that they are not sorted on the second field, because you didn't sort on the second field in the record.
An unstable sort will give you: . . . AAA|plumbing AAA|car rental AAA|towing . . .
Note that the records are still sorted only on the first field, and, the order of the second field differs from the input order.
An unstable sort has the potential to be faster. A stable sort tends to mimic what non-computer scientist/non-math folks have in their mind when they sort something. Ie, if you did an insertion sort with index cards you would most likely have a stable sort.
You can't always compare all the fields at once. A couple of examples: (1) memory limits, where you are sorting a large disk file, and there isn't room for all the fields of all records in main memory; (2) Sorting a list of base class pointers, where some of the objects may be derived subclasses (you only have access to the base class fields).
Also, stable sorts have deterministic output given the same input, which can be important for debugging and testing.