views:

385

answers:

9

I have an array of N * L integers

int array[N * L];

I want to divide the array in N blocks of L elements and sort the blocks according to the classic list comparison (compare first element, than second, etc).

In C I can do:

qsort(array, N, sizeof(int) * L, comp);

How can I achieve the same result using C++ std::sort(...) which is, presumably, faster?

Update: The array must stay a C array like above. Can't use std::vector<>. My data comes through comes from MPI_Recv(...).

Update: My problem is that L is variable and with qsort() version I need to make L global (and I say goodbye to thread safety). std::sort() allows me to pass a functor which stores L at construction point.

NOTE: The qsort function implements a quick-sort algorithm to sort an array of N elements, each of L bytes.

A: 
std::array< std::array<int, L>, N > array;
// or std::vector< std::vector<int> > if N*L is not a constant
std::sort( array.begin(), array.end() );
Kirill V. Lyadvinsky
Can't use std::vector<>. I've updated the question.
Alexandru
Then use `qsort`, using std::sort is not a faster way.
Kirill V. Lyadvinsky
A: 
namespace
{
    struct NewCompare
    {
     bool operator()( const int a, const int b ) const
     {
      return a < b;
     }

    };
}

std::sort(array+start,array+start+L,NewCompare);

Do test with std::stable_sort() on realistic data-sets - for some data mixes its substantially faster!

On many compilers (GCC iirc) there's a nasty bite: the std::sort() template asserts that the comparator is correct by testing it TWICE, once reversed, to ensure the result is reversed! This will absolutely completely kill performance for moderate datasets in normal builds. The solution is something like this:

#ifdef NDEBUG
  #define WAS_NDEBUG
  #undef NDEBUG
#endif
#define NDEBUG
#include <algorithm>
#ifdef WAS_NDEBUG
  #undef WAS_NDEBUG
#else
  #undef NDEBUG
#endif

Adapted from this excellent blog entry: http://www.tilander.org/aurora/2007/12/comparing-stdsort-and-qsort.html

Will
i don't get it. how does this handle the block nature of array?
Matt Joiner
+3  A: 

You can use a "stride iterator" for this. A "stride iterator" wraps another iterator and an integer step size. Here's a simple sketch:

template<typename Iter>
class stride_iterator
{
    ...

    stride_iterator(Iter it, difference_type step = difference_type(1))
    : it_(it), step_(step) {}

    stride_iterator& operator++() {
        std::advance(it_,step_);
        return *this;
    }

    Iter base() const { return it_; }

    difference_type step() const { return step_; }

    ...

private:
    Iter it_;
    difference_type step_;
};

Also, helper functions like these

template<typename Iter>
stride_iterator<Iter> make_stride_iter(
    Iter it,
    typename iterator_traits<Iter>::difference_type step)
{
    return stride_iterator<Iter>(it,step);
}

template<typename Iter>
stride_iterator<Iter> make_stride_iter(
    stride_iterator<Iter> it,
    typename iterator_traits<Iter>::difference_type step)
{
    return stride_iterator<Iter>(it.base(),it.step() * step);
}

should make it fairly easy to use stride iterators:

int array[N*L];
std::sort( make_stride_iter(array,L),
           make_stride_iter(array,L)+N );

Implementing the iterator adapter all by yourself (with all operators) is probably not a good idea. As Matthieu pointed out, you can safe yourself a lot of typing if you make use of Boost's iterator adapter tools, for example.

Edit: I just realized that this doesn't do what you wanted since std::sort will only exchange the first element of each block. I don't think there's an easy and portable solution for this. The problem I see is that swapping "elements" (your blocks) cannot be (easily) customized when using std::sort. You could possibly write your iterator to return a special reference type with a special swap function but I'm not sure whether the C++ standard guarantees that std::sort will use a swap function that is looked up via ADL. Your implementation may restrict it to std::swap.

I guess the best answer is still: "Just use qsort".

sellibitze
Alexandru
sellibitze
Ouch, that's code bloat. `boost::iterator_adaptor`: http://www.boost.org/doc/libs/1_40_0/libs/iterator/doc/iterator_adaptor.html
Matthieu M.
Yeah, using some iterator adapter library seems like a good idea. For that matter, I think that stride iterators should be part of std:: or at least boost:: as well. ;)
sellibitze
A: 

I'm not sure if you can achieve the same result without a lot more work. std::sort() is made to sort sequences of elements defined by two random access iterators. Unfortunately, it determines the type of the element from the iterator. For example:

std::sort(&array[0], &array[N + L]);

will sort all of the elements of array. The problem is that it assumes that the subscripting, increment, decrement, and other indexing operators of the iterator step over elements of the sequence. I believe that the only way that you can sort slices of the array (I think that this is what you are after), is to write an iterator that indexes based on L. This is what sellibitze has done in the stride_iterator answer.

D.Shawley
A: 

I don't remember exactly how to do this, but if you can fake anonymous functions, then you can make a comp(L) function that returns the version of comp for arrays of length L... that way L becomes a parameter, not a global, and you can use qsort. As others mentioned, except in the case where your array is already sorted, or backwards or something, qsort is going to be pretty much just as fast as any other algorithm. (there's a reason it's called quicksort after all...)

Brian Postow
+1  A: 

How about having a reordering vector? You initialize vector with 1..N/L, pass std::sort a comparator that compares elements i1*L..i1*L+L to i2*L..i2*L+L, and when your vector is properly sorted, reorder the C array according to new order.

In response to comment: yes things get complicated, but it may just be good complication! Take a look here.

Arkadiy
Things get complicated if you don't use an additional array (which I won't).
Alexandru
A: 

It's not part of any ANSI, ISO, or POSIX standard, but some systems provide the qsort_r() function, which allows you to pass an extra context parameter to the comparison function. You can then do something like this:

int comp(void *thunk, const void *a, const void *b)
{
    int L = (int)thunk;
    // compare a and b as you would normally with a qsort comparison function
}

qsort_r(array, N, sizeof(int) * L, (void *)L, comp);

Alternatively, if you don't have qsort_r, you can use the callback(3) package from the ffcall library to create closures at runtime. Example:

#include <callback.h>
void comp_base(void *data, va_alist alist)
{
    va_start_int(alist);  // return type will be int

    int L = (int)data;
    const void *a = va_arg_ptr(alist, const void*);
    const void *b = va_arg_ptr(alist, const void*);

    // Now that we know L, compare
    int return_value = comp(a, b, L);

    va_return_int(alist, return_value);  // return return_value
}

...    

// In a function somewhere
typedef int (*compare_func)(const void*, const void*);

// Create some closures with different L values
compare_func comp1 = (compare_func)alloc_callback(&comp_base, (void *)L1);
compare_func comp2 = (compare_func)alloc_callback(&comp_base, (void *)L2);
...
// Use comp1 & comp2, e.g. as parameters to qsort
...
free_callback(comp1);
free_callback(comp2);

Note that the callback library is threadsafe, since all parameters are passed on the stack or in registers. The library takes care of allocating memory, making sure that memory is executable, and flushing the instruction cache if necessary to allow dynamically generated code (that is, the closure) to be executed at runtime. It supposedly works on a large variety of systems, but it's also quite possible that it won't work on yours, either due to bugs or lack of implementation.

Also note that this adds a little bit of overhead to the function call. Each call to comp_base() above has to unpack its arguments from the list passed it (which is in a highly platform-dependent format) and stuff its return value back in. Most of the time, this overhead is miniscule, but for a comparison function where the actual work performed is very small and which will get called many, many times during a call to qsort(), the overhead is very significant.

Adam Rosenfield
A: 

Arkadiy has the right idea. You can sort in place if you create an array of pointers and sort that:

#define NN 7
#define LL 4

int array[NN*LL] = {
    3, 5, 5, 5,
    3, 6, 6, 6,
    4, 4, 4, 4,
    4, 3, 3, 3,
    2, 2, 2, 2,
    2, 0, 0, 0,
    1, 1, 1, 1
};

struct IntPtrArrayComp {
    int length;
    IntPtrArrayComp(int len) : length(len) {}
    bool operator()(int* const & a, int* const & b) {
        for (int i = 0; i < length; ++i) {
            if (a[i] < b[i]) return true;
            else if (a[i] > b[i]) return false;
        }
        return false;
    }
};

void sortArrayInPlace(int* array, int number, int length)
{
    int** ptrs = new int*[number];
    int** span = ptrs;
    for (int* a = array; a < array+number*length; a+=length) {
        *span++ = a;
    }
    std::sort(ptrs, ptrs+number, IntPtrArrayComp(length));
    int* buf = new int[number];
    for (int n = 0; n < number; ++n) {
        int offset = (ptrs[n] - array)/length;
        if (offset == n) continue;

        // swap
        int* a_n = array+n*length;
        std::move(a_n, a_n+length, buf);
        std::move(ptrs[n], ptrs[n]+length, a_n);
        std::move(buf, buf+length, ptrs[n]);

        // find what is pointing to a_n and point it 
        // to where the data was move to
        int find = 0;
        for (int i = n+1; i < number; ++i) {
            if (ptrs[i] == a_n) {
                find = i;
                break;
            }
        }
        ptrs[find] = ptrs[n];
    }
    delete[] buf;
    delete[] ptrs;
}

int main()
{
    for (int n = 0; n< NN; ++n) {
        for (int l = 0; l < LL; ++l) {
            std::cout << array[n*LL+l];
        }
        std::cout << std::endl;
    }
    std::cout << "----" << std::endl;
    sortArrayInPlace(array, NN, LL);
    for (int n = 0; n< NN; ++n) {
        for (int l = 0; l < LL; ++l) {
            std::cout << array[n*LL+l];
        }
        std::cout << std::endl;
    }
    return 0;
}

Output:

3555
3666
4444
4333
2222
2000
1111
----
1111
2000
2222
3555
3666
4333
4444
jmucchiello
replace int* buf = new int[number];with int* buf = new int[length];Arkady's answer was correct, but I wanted a solution without a second array. I'm going to accept his answer as being the best solution 'couse he came up with the original idea.
Alexandru
A: 

A lot of these answers seem like overkill. If you really have to do it C++ style, using jmucchiello's example:

template <int Length>
struct Block
{
    int n_[Length];

    bool operator <(Block const &rhs) const
    {
     for (int i(0); i < Length; ++i)
     {
      if (n_[i] < rhs.n_[i])
       return true;
      else if (n_[i] > rhs.n_[i])
       return false;
     }
     return false;
    }
};

and then sort with:

sort((Block<4> *)&array[0], (Block<4> *)&array[NN]);

It doesn't have to be any more complicated.

Matt Joiner
This requires the length to be known at compile time, though.
Charles Bailey
not hard to modify otherwise. from the example i based it on, it is.
Matt Joiner