views:

387

answers:

4

Hi.

I think that QuickSort in some specific conditions may cause a stack overflow exception.

There are two basic ways of selecting the pivot element during the sort process - the pivot value can be the element in the middle of the sorted range or the element chosen randomly (within the sorted range). Is the second method (random) less stack overflow prone than the first one? Could you please advise me?

Here is my version of quick sort (Delphi):

procedure QuickSort(lLowBound, lHighBound: integer; lCompare: TListSortCompare;
  lSwap: TListSortSwap);

  procedure Sort(lLowIndex, lHighIndex: integer);
  var
    lLeft: Integer;
    lRight: Integer;
    lPivot: Integer;
    lLeftCompare: Integer;
    lRightCompare: Integer;
  begin
    repeat
      lLeft := lLowIndex;
      lRight := lHighIndex;
      lPivot := (lLowIndex + lHighIndex) div 2; //the pivot as the element in the middle
      //lPivot := lLowIndex + Random(lHighIndex - lLowIndex + 1); //the pivot chosen randomly
      repeat
        lLeftCompare := lCompare(lLeft, lPivot);
        while lLeftCompare < 0 do
        begin
          Inc(lLeft);
          lLeftCompare := lCompare(lLeft, lPivot);
        end;
        lRightCompare := lCompare(lRight, lPivot);
        while lRightCompare > 0 do
        begin
          Dec(lRight);
          lRightCompare := lCompare(lRight, lPivot);
        end;

        if lLeft <= lRight then
        begin
          if not ((lLeftCompare = 0) and (lRightCompare = 0)) then
          begin
            lSwap(lRight, lLeft);

            if lPivot = lLeft then
              lPivot := lRight
            else if lPivot = lRight then
              lPivot := lLeft;
          end;
          Inc(lLeft);
          Dec(lRight);
        end;
      until lLeft > lRight;

      if (lLowIndex < lRight) then
        Sort(lLowIndex, lRight);

      lLowIndex := lLeft;
    until lLeft >= lHighIndex;
  end;

begin
  if lHighBound > lLowBound then
    Sort(lLowBound, lHighBound);
end;

Thanks for your advice in advance!

Mariusz.

+1  A: 

A probabilistic way to improve efficiency is to pick 3 random elements and use the middle value (the one that is not the greatest nor the least).

You can also use a stack of records to push and pop the bounds and write a loop instead of making recursive calls (also it will use less memory since the pointer to the array won't need to be replicated for all the calls).

EDIT: I've noticed that the internal procedure doesn't take the pointer as parameter, so forget that part ^_^ anyway, the stack frame has more information than just the parameters of the function, so it will be still more memory efficient (and the main point was that the heap were the data stack is allocated is usually larger than the process stack).

fortran
Thanks for your advice. I managed to make an iterational version of quick sort the way you suggested.
Mariusz
+4  A: 

Using any element at a specific index (first, last or middle) as the pivot element always incurs the risk of degeneration with specific data sets. First and last element are particularly bad because they degenerate with presorted (or nearly presorted) data, which is quite common. The middle element is less problematic in parctice, but still vulnerable to maliciously constructed datasets.

Using a random element means degeneration can only happen through pure bad luck (assuming that the RNG is not predictable by a hypothetical attacker), so it's a good tactic. A further improvement that significantly reduces the likelihood of being hit by that bad luck would be to use the median of 3 (or 5, or more) randomly chosen elements, but it has to be weighted against the additional complexity and running time this incurs.

Michael Borgwardt
A: 

Thanks for your answers.

Fortran, thanks for your suggestions concerning making a non-recursive method. Basing on them, I managed to make an iterational quick sort and it seems to be working properly :).

Here's the code:

procedure QuickSortI(lLowBound, lHighBound: integer; lCompare: TListSortCompare;
  lSwap: TListSortSwap);
var
  lLeft: Integer;
  lRight: Integer;
  lPivot: Integer;
  lLeftCompare: Integer;
  lRightCompare: Integer;
  lStack: array of integer;
  lStackLen: integer;
begin
  if lHighBound > lLowBound then
  begin
    lStackLen := 2;
    SetLength(lStack, lStackLen);
    lStack[lStackLen - 1] := lLowBound;
    lStack[lStackLen - 2] := lHighBound;

    repeat
      lLowBound := lStack[lStackLen - 1];
      lHighBound := lStack[lStackLen - 2];
      SetLength(lStack, lStackLen - 2);
      Dec(lStackLen, 2);

      lLeft := lLowBound;
      lRight := lHighBound;
      lPivot := (lLowBound + lHighBound) div 2;
      repeat
        lLeftCompare := lCompare(lLeft, lPivot);
        while lLeftCompare < 0 do
        begin
          Inc(lLeft);
          lLeftCompare := lCompare(lLeft, lPivot);
        end;
        lRightCompare := lCompare(lRight, lPivot);
        while lRightCompare > 0 do
        begin
          Dec(lRight);
          lRightCompare := lCompare(lRight, lPivot);
        end;

        if lLeft <= lRight then
        begin
          if not ((lLeftCompare = 0) and (lRightCompare = 0)) then
          begin
            lSwap(lRight, lLeft);

            if lPivot = lLeft then
              lPivot := lRight
            else if lPivot = lRight then
              lPivot := lLeft;
          end;
          Inc(lLeft);
          Dec(lRight);
        end;
      until lLeft > lRight;

      if (lHighBound > lLeft) then
      begin
        Inc(lStackLen, 2);
        SetLength(lStack, lStackLen);
        lStack[lStackLen - 1] := lLeft;
        lStack[lStackLen - 2] := lHighBound;
      end;

      if (lLowBound < lRight) then
      begin
        Inc(lStackLen, 2);
        SetLength(lStack, lStackLen);
        lStack[lStackLen - 1] := lLowBound;
        lStack[lStackLen - 2] := lRight;
      end;

    until lStackLen = 0;
  end;
end;

I hope I implemented it in an optimal way. I used a dynamic array to store the sort boundaries (each pair of items is the low and high bound).

This iterational method seems to be slightly slower than the recursive one, but I think it does not matter so much.

If you notice a mistake or you know a way of optimizing the method, I will be grateful if you let me know.

Thanks!

Mariusz.

Mariusz
A: 

A decent quicksort implementation uses O(log n) stack space. It achieves that by sorting the smallest subarray first. Worst case if you do not do that is the situation where the pivot is the largest element and you try sorting a subarray that is only one smaller each time. This occurs when you use already sorted data as input and take as a pivot the right element.

Your explicit stack implementation is slower and suffers from the same problem (though it now is heap instead of stack).

Another thing missing is a switch to insertion sort when the subarray is small (5-25 elements). Also take a look at the dual-pivot quicksort questions on this site.

Stephan Eggermont