views:

337

answers:

2

I have a question about quick sort algorithm. I implement quick sort algorithm and play it. The elements in initial unsorted array are random numbers chosen from certain range. I find the range of random number effects the running time. For example, the running time for 1, 000, 000 random number chosen from the range (1 - 2000) takes 40 seconds. While it takes 9 seconds if the 1,000,000 number chosen from the range (1 - 10,000). But I do not know how to explain it. In class, we talk about the pivot value can effect the depth of recursion tree.
For my implementation, the last value of the array is chosen as pivot value. I do not use randomized scheme to select pivot value.

int partition( vector<int> &vec, int p, int r) {

  int x = vec[r];
  int i = (p-1);
  int j = p;
  while(1) {

    if (vec[j] <= x){
      i = (i+1);
      int temp = vec[j];
      vec[j] = vec[i];
      vec[i] = temp;
    }
    j=j+1;
    if (j==r)
      break;
 }
  int temp = vec[i+1];
  vec[i+1] = vec[r];
  vec[r] = temp;
  return i+1;
}

void quicksort ( vector<int> &vec, int p, int r) {

  if (p<r){
    int q = partition(vec, p, r);
    quicksort(vec, p, q-1);
    quicksort(vec, q+1, r);
  }
}

    void random_generator(int num, int * array) {

      srand((unsigned)time(0)); 
      int random_integer; 
      for(int index=0; index< num; index++){ 
        random_integer = (rand()%10000)+1; 
        *(array+index) = random_integer; 
      } 
    }

    int main() {
      int array_size = 1000000;
      int input_array[array_size];
      random_generator(array_size, input_array);
      vector<int> vec(input_array, input_array+array_size);

      clock_t t1, t2;
      t1 = clock();
      quicksort(vec, 0, (array_size - 1));   // call quick sort
      int length = vec.size();
      t2 = clock();
      float diff = ((float)t2 - (float)t1);
      cout << diff << endl;
      cout << diff/CLOCKS_PER_SEC <<endl;
    }
A: 

It probably has to do with how well sorted the input is. Quicksort is O(n logn) if the input is reasonably random. If it's in reverse order, performance can degrade to O(n^2). You're probably getting closer to the O(n^2) behavior with the smaller data range.

Sagekilla
the effect of which is lessened using a median of 3 pivot selection...
Mitch Wheat
I do not use median of 3 pivot selection. I just use the last element of array is chosen as pivot value.
chnet
I post my quick sort code.
chnet
+3  A: 

Most likely it's not performing well because quicksort doesn't handle lots of duplicates very well and may still result in swapping them (order of key-equal elements isn't guaranteed to be preserved). You'll notice that the number of duplicates per number is 100 for 10000 or 500 for 2000, while the time factor is also approximately a factor of 5.

Have you averaged the runtimes over at least 5-10 runs at each size to give it a fair shot of getting a good starting pivot?

As a comparison have you checked to see how std::sort and std::stable_sort also perform on the same data sets?

Finally for this distribution of data (unless this is a quicksort exercise) I think counting sort would be much better - 40K memory to store the counts and it runs in O(n).

Mark B
++ I suspect you're right about the duplicates. Of course the O(n) sort wins if you can use it. The issues the OP is asking about are why I don't trust quicksort. I rely on mergesort, even though I code it myself.
Mike Dunlavey