views:

555

answers:

5

When analyzing QS, every one always refers to the "almost sorted" worst case. When can such a scenario occur with natural input?

The only example I came up with is re-indexing.

+4  A: 

From Quicksort

for quicksort, "worst case" corresponds to already sorted

A list with all the items the same number is already sorted.

astander
too bad, but your source is not entirely right, see Jens Ameskamp's answer
swegi
+1, since if all the numbers are the same you'd get the worst case regardless of how you choose the pivot
orip
A: 

When its already sorted.

S.Mark
that's not entirely right, see Jens Ameskamp's answer
swegi
+6  A: 

I believe that the worst case for quicksort depends on the choice of the pivot element at every step. Quicksort has its worst performance, if the pivot is likely to be either the smallest, or the largest element in the list (e.g. the first or last element of an already sorted list).

If, e.g. you choose the middle element of the list, an already sorted list does not have the worst case runtime.

So, if you suspect your scenario is likely to a bad case scenario for quicksort, you can simply change your choice of pivot element to make quicksort perform better.

Note: I know, that this did not give more example of real world occasions for quicksort worst cases. Examples of this depend on the implementation you are working with.

Jens
or you use something like median-of-3 to get a relatively well-chosen pivot
swegi
or you use a random element. That only goes wrong with a very (very very) small probability, independend of possibly obscure inputs.
Jens
The "already sorted" meme's perhaps so prevalent because lots of people take the first element as the pivot, on the assumption that the list is unsorted. In the case of a sorted list, that's the worst element you could choose.
Frank Shearar
A list in reverse sorted order would be the worst case if you choose the first element as pivot. One has to choose the last element to make the already sorted case the worst case.Note: The questioner asks for the "almost sorted" case, which is the worst case only with high probability even if you choose the last element. It could be (with small probability) that the median is the last element, which means that almost sorted can as well be the best case.
swegi
@swegi: The problem occurs when the current subarray is not sufficiently evenly divided for recursion. It does not matter which extreme (largest or smallest) pivot is chosen; as long as it is extreme, you get worst case behaviour.
Svante
@Svante: You're completely right, thanks for pointing that out.
swegi
The worst case can also be caused by repeated elements depending on the quicksort implementation. If the quicksort partitions into [ < p | >= p] at each step, then having all elements the same will lead to worst case behavior no matter what pivot is chosen because one partition, [ < p], will have zero elements each time. A quicksort that partitions into [ <= p | >= p] will also have. There are modifications to these quicksorts that overcome this difficulty.
Justin Peel
I think I didn't phrase my question well enough.. I didn't mean to ask what sort of input is QS's worst case. I assumes that is is the "almost sorted" case, and the question is - "when do we get an almost sorted input in real life input situations" ?Sorry for the confusion, and thanks for the answers :)
Shira
+1  A: 

@ swegi

While Jens Ameskamp's answer is by far the most complete astander's is, in my opinion more relevant.

There are methods for ensuring an as close to O( n log( n ) ) run time on Quick Sort as possible (randomized pivot selection is the first that jumps to mind) but in the case of a mostly sorted and mostly same set of data you will wind up with a very slow Quick Sort (also generally, merge sort will do less work than your quick sort but this is a different tangent).

As for where this will occur I cannot honestly say, anywhere in which you have large sets of sorted / nearly sorted, mostly same data.

Mimisbrunnr
It depends on how you implement the partitioning. With the usual smaller-equal-larger partitioning, you will collect all (or almost all) elements in the equal bucket. This makes the mostly equal data case one of the best cases for `QuickSort`, independently from the ordering of the elements.
swegi
It has been so long since I've utilized a QuickSort, I didn't consider the smaller-equal-larger partitioning. Thanks for point that out.
Mimisbrunnr
Any smarter pivot selection will fix the "already sorted" list worst case - even randomized pivot selection will reduce this probability to nil. On real-world applications, pivot selections such as median-of-three usually prove superior.
orip
Also, algorithms like merge sort, heap sort, or even quick sort with ideal pivot selection (can be done in O(n)) guarantee O(n log(n)) but usually have larger constants, making quicksort faster in practice. In practive a hybrid approach is usually used: Introsort, for example, uses quicksort but switches to heapsort when a bad input is detected. Also, for small arrays O(n^2) algorithms like insertion sort are faster because of low constants, so when the recursion reaches small ranges switch to low-overhead O(n^2).
orip
+5  A: 

I think people are confusing Quicksort the partition-based sorting algorithm, and "qsort" the various library implementations.

I prefer to see Quicksort the algorithm as having a pluggable pivot selection algorithm, which is quite essential in analyzing its behavior.

If the first element is always chosen as the pivot, then an already sorted list is the worst-case. Often there's a high probability that the array is already/nearly sorted, so this implementation is rather poor.

Analogously, selecting the last element as the pivot is bad for the same reason.

Some implementations tries to avoid this problem by choosing the middle element as the pivot. This would not perform as badly on already/nearly sorted arrays, but one could still construct an input that would exploit this predictable pivot selection and make it run in quadratic time.

Thus, you get randomized pivot selection algorithms, but even this doesn't guarantee O(N log N).

So other algorithms were developed that would use some information from the sequence before picking a pivot. You can of course scan the whole sequence and find the median, and use that as the pivot. This guarantees O(N log N), but of course slower in practice.

So some corners are cut, and people devised the median-of-3 algorithm. Of course, later even this was exploitable by the so-called median-of-3 "killer".

So more attempts are made at coming up with more "intelligent" pivot selection algorithms that guarantees O(N log N) asymptotic behavior that is still fast enough to be practical, with varying degree of success.

So really, unless one specifies a particular implementation of Quicksort, the question of when the worst case scenario occurs is ill-defined. If you use the so-called median-of-medians pivot selection algorithm, there is no quadratic worst-case scenario.

Most library implementations, however, are likely to forfeit O(N log N) guarantee for much faster sorting in the average case. Some of the really old implementations use the first element as the pivot, which is now well-understood as poor and is no longer a practice widely followed.

polygenelubricants
+ 1 nice answer
swegi