views:

266

answers:

4

The function is defined as

void bucketsort(Array& A){
  size_t numBuckets=A.size();
  iarray<List> buckets(numBuckets);

  //put in buckets
  for(size_t i=0;i!=A.size();i++){
    buckets[int(numBuckets*A[i])].push_back(A[i]);
  }

  ////get back from buckets
  //for(size_t i=0,head=0;i!=numBuckets;i++){
  //size_t bucket_size=buckets[i].size();
  //for(size_t j=0;j!=bucket_size;j++){
  //  A[head+j] = buckets[i].front();
  //  buckets[i].pop_front();
  //}
  //head += bucket_size;
  //}
 for(size_t i=0,head=0;i!=numBuckets;i++){
   while(!buckets[i].empty()){
     A[head]          = buckets[i].back();
     buckets[i].pop_back();
     head++;
   }
 }

  //inseration sort
  insertionsort(A);
}

where List is just list<double> in STL.

The content of array are generate randomly in [0,1). Theoretically bucket sort should be faster than quicksort for large size for it's O(n),but it fails as in the following graph.

alt text

I use google-perftools to profile it on a 10000000 double array. It reports as follow

alt text

It seems I should not use STL list,but I wonder why? Which does std_List_node_base_M_hook do? Should I write list class myself?

PS:The experiment and improvement
I have tried just leave the codes of putting in buckets and this explained that most time is used on building up buckets.
The following improvement is made: - Use STL vector as buckets and reserve reasonable space for buckets - Use two helper array to store the information used in building buckets,thus avoiding the use of linked list,as in following code

void bucketsort2(Array& A){
  size_t    numBuckets = ceil(A.size()/1000);
  Array B(A.size());
  IndexArray    head(numBuckets+1,0),offset(numBuckets,0);//extra end of head is used to avoid checking of i == A.size()-1

  for(size_t i=0;i!=A.size();i++){
    head[int(numBuckets*A[i])+1]++;//Note the +1
  }
  for(size_t i=2;i<numBuckets;i++){//head[1] is right already
    head[i] += head[i-1];
  }

  for(size_t i=0;i<A.size();i++){
    size_t  bucket_num         = int(numBuckets*A[i]);
    B[head[bucket_num]+offset[bucket_num]] = A[i];
    offset[bucket_num]++;
  }
  A.swap(B);

  //insertionsort(A);
  for(size_t i=0;i<numBuckets;i++)
    quicksort_range(A,head[i],head[i]+offset[i]);
}

The result in the following graph alt text where line start with list using list as buckets,start with vector using vector as buckets,start 2 using helper arrays.By default insertion sort is used at last and some use quick sort as the bucket size is big.
Note "list" and "list,only put in" ,"vector,reserve 8" and "vector,reserve 2" nearly overlap.
I will try small size with enough memory reserved.

+1  A: 

Linked Lists are not arrays. They are substantially slower to perform operations like lookup. The STL sort may well have a specific version for lists that takes this into account and optimizes for it- but your function blindly ignores what container it's using. You should try using an STL vector as your array.

DeadMG
The STL list is used just as bucket and only push_back front and pop_front is used,which should takes const time.And the only sort is insertionsort(A) where A is actually array of double type.
luoq
@luoq: You are calling `size()` on the lists, which is O(N) rather than O(1).
Oli Charlesworth
@Oli Charlesworth : should be O(1) in good implementation. Also this he has as much bucket as the number of elements in the array. Each list should be of length ~1 if he distribution is uniform.
Loïc Février
@Oli Charlesworth: The complexity of size is "Constant (recommended). Linear in some implementations.".So this may make the function slow.But even if it's O(n),the total extra time should be O(size of A).Whatever, I'll try not use size.
luoq
@luoq : you can use a `while(buckets[i].empty())` ;)
Loïc Février
@Loïc Février :I'm working on it
luoq
@lucq See my answer. It's of course `while(!buckets[i].empty())`...
Loïc Février
Using while(!buckets[i].empty()) instead of size() the time is roughly the same.It seems a little faster with large n.
luoq
@Loic: O(1) `size()` conflicts with O(1) `splice()`, so maybe, maybe not. GNU implementation is an example with O(N) time (it calls `std::distance()`.
Oli Charlesworth
@Oli Charlesworth : don't see any conflicts. Splice works with iterator, should not need size(). Where is the conflict ? Note that the third version of slice is not O(1). See http://www.cplusplus.com/reference/stl/list/splice/
Loïc Février
@Loic: Precisely. Some implementers choose O(1) `size`, some choose O(1) `slice(iterator, x, first, last)`. MSVC went for the former, GNU for the latter.
Oli Charlesworth
Alright. There was no indication that slice could be O(1) in some implementation.
Loïc Février
@Loïc Février: check out the SGI docs, they document function complexity. If you look at the "new members" section, all versions of `splice`, including the "range" version are documented as "This function is constant time.". To be constant time, one must not iterate over that range! See http://www.sgi.com/tech/stl/List.html
André Caron
@André Caron : Thanks !
Loïc Février
+1  A: 

With

iarray<List> buckets(numBuckets);

you are basically creating a LOT of lists and that can cost you a lot especially in memory access which it theoretically linear but that's not the case in practice.

Try to reduce the number of buckets.

To verify my assertion analyse your code speed with only the creation of the lists.

Also to iterate over the elements of the lists you should not use .size() but rather

//get back from buckets
for(size_t i=0,head=0;i!=numBuckets;i++)
  while(!buckets[i].empty())
  {
    A[head++] = buckets[i].front();
    buckets[i].pop_front();
  }

In some implementations .size() can be in O(n). Unlikely but...


After some research I found this page explaining what is the code for std::_List_node_base::hook.

Seems it is only to insert an element at a given place in a list. Shouldn't cost a lot..

Loïc Février
size() seems constant in my enviroment(GCC),I'll try your first idea
luoq
Even if constant, the "correct" way to take all elements from a list is with empty/front/pop.
Loïc Février
It cannot explain that much time.I run the function with just initializing buckets(keep only first two lines),from n=1000 to 4096000 ,the runtime is 2%-5% of original.
luoq
Ok. Try that + "put in buckets" and not "get back". To see in which lines is the problem.
Loïc Février
I will try using array to store the size of buckets and direct copy the array to a new one according to the size information.Thus using of linked list is avoid.I will check the run time then.
luoq
@Loïc Février:Direcr and good idea
luoq
@lucq : since we do not know what is std_List_node_base_M_hook, a simple thing to do would be to see where it is used.
Loïc Février
That's it.After adding "put in buckets" the time is in fact longer than doing the complete due to maybe some random factor.
luoq
@luoq : ok, but that does not explains why. No cache in memory access maybe ? Have you tried push_front ? Because pop() does not seems to take much time, it should cost you as much than push_back.
Loïc Février
It's std_List_node_base::_M_hook instead of std_List_node_base_M_hook,which is defined(or declared) at line 87 of /usr/include/c++/4.5.1/bits/stl_list.h with GCC 4.5.1. But I cannot understand that file.
luoq
See the update to my answer. You have the code of the function. Basically you have a double-linked list and that function update next/prev the elements when a new one is inserted.
Loïc Février
push_front dosen't make much difference.The differnce between push and pop may be that push requires allocation of memory which somewhat does not takes linear time.I will try use vector as bucker and reserve some space fot each.
luoq
Using vector as buckets make the code faster ,it seems O(n) now
luoq
@luoq: that's normal, `list` is the slowest container in the STL. It's only useful for its iterator invalidation property (and for time to time the `slice` operation). Don't use it if possible.
Matthieu M.
+1  A: 

I think perhaps the interesting question is, Why are you creating an inordinately large number of buckets?

Consider the input {1,2,3}, numBuckets = 3. The loop containing buckets[int(numBuckets*A[i])].push_back(A[i]); is going to unroll to

buckets[3].push_back(1);  
buckets[6].push_back(2);  
buckets[9].push_back(3);  

Really? Nine buckets for three values...

Consider if you passed a permutation of the range 1..100. You'd create 10,000 buckets and only use 1% of them. ... and each of those unused buckets requires creating a List in it. ... and has to be iterated over and then discarded in the readout loop.

Even more exciting, sort the list 1..70000 and watch your heap manager explode trying to create 4.9 billion Lists.

Eric Towers
The content of array are generate randomly in [0,1).So just numBuckets buckets are created and all element can be put in.
luoq
+1  A: 

In my opinion, the biggest bottleneck here is memory management functions (such as new and delete).

Quicksort (of which STL probably uses an optimized version) can sort an array in-place, meaning it requires absolutely no heap allocations. That is why it performs so well in practice.

Bucket sort relies on additional working space, which is assumed to be readily available in theory (i.e. memory allocation is assumed to take no time at all). In practice, memory allocation can take anywhere from (large) constant time to linear time in the size of memory requested (Windows, for example, will take time to zero the contents of pages when they are allocated). This means standard linked list implementations are going to suffer, and dominate the running time of your sort.

Try using a custom list implementation that pre-allocates memory for a large number of items, and you should see your sort running much faster.

casablanca
I have tried using vector as buckets(use push_back pop_back,and reserve space for two double),the code runs faster than using list but putting in buckets also consume most time.The problem is that some bucket will have larger content .If pre-allocating that for each bucket will waste a lot of memory and time.And for now I donnot kown the distribution of size of the most large bucket.
luoq
Those are exactly the conditions that bucket sort requires to perform well: it assumes you have enough extra space readily available.
casablanca
And that is also the reason why bucket sort is impractical for large data sets.
casablanca