When analyzing QS, every one always refers to the "almost sorted" worst case. When can such a scenario occur with natural input?
The only example I came up with is re-indexing.
When analyzing QS, every one always refers to the "almost sorted" worst case. When can such a scenario occur with natural input?
The only example I came up with is re-indexing.
From Quicksort
for quicksort, "worst case" corresponds to already sorted
A list with all the items the same number is already sorted.
I believe that the worst case for quicksort depends on the choice of the pivot element at every step. Quicksort has its worst performance, if the pivot is likely to be either the smallest, or the largest element in the list (e.g. the first or last element of an already sorted list).
If, e.g. you choose the middle element of the list, an already sorted list does not have the worst case runtime.
So, if you suspect your scenario is likely to a bad case scenario for quicksort, you can simply change your choice of pivot element to make quicksort perform better.
Note: I know, that this did not give more example of real world occasions for quicksort worst cases. Examples of this depend on the implementation you are working with.
@ swegi
While Jens Ameskamp's answer is by far the most complete astander's is, in my opinion more relevant.
There are methods for ensuring an as close to O( n log( n ) ) run time on Quick Sort as possible (randomized pivot selection is the first that jumps to mind) but in the case of a mostly sorted and mostly same set of data you will wind up with a very slow Quick Sort (also generally, merge sort will do less work than your quick sort but this is a different tangent).
As for where this will occur I cannot honestly say, anywhere in which you have large sets of sorted / nearly sorted, mostly same data.
I think people are confusing Quicksort the partition-based sorting algorithm, and "qsort" the various library implementations.
I prefer to see Quicksort the algorithm as having a pluggable pivot selection algorithm, which is quite essential in analyzing its behavior.
If the first element is always chosen as the pivot, then an already sorted list is the worst-case. Often there's a high probability that the array is already/nearly sorted, so this implementation is rather poor.
Analogously, selecting the last element as the pivot is bad for the same reason.
Some implementations tries to avoid this problem by choosing the middle element as the pivot. This would not perform as badly on already/nearly sorted arrays, but one could still construct an input that would exploit this predictable pivot selection and make it run in quadratic time.
Thus, you get randomized pivot selection algorithms, but even this doesn't guarantee O(N log N)
.
So other algorithms were developed that would use some information from the sequence before picking a pivot. You can of course scan the whole sequence and find the median, and use that as the pivot. This guarantees O(N log N)
, but of course slower in practice.
So some corners are cut, and people devised the median-of-3 algorithm. Of course, later even this was exploitable by the so-called median-of-3 "killer".
So more attempts are made at coming up with more "intelligent" pivot selection algorithms that guarantees O(N log N)
asymptotic behavior that is still fast enough to be practical, with varying degree of success.
So really, unless one specifies a particular implementation of Quicksort, the question of when the worst case scenario occurs is ill-defined. If you use the so-called median-of-medians pivot selection algorithm, there is no quadratic worst-case scenario.
Most library implementations, however, are likely to forfeit O(N log N)
guarantee for much faster sorting in the average case. Some of the really old implementations use the first element as the pivot, which is now well-understood as poor and is no longer a practice widely followed.