views:

528

answers:

7

I have a single buffer, and several pointers into it. I want to sort the pointers based upon the bytes in the buffer they point at.

qsort() and stl::sort() can be given custom comparision functions. For example, if the buffer was zero-terminated I could use strcmp:

int my_strcmp(const void* a,const void* b) {
  const char* const one = *(const char**)a,
  const two = *(const char**)b;
  return ::strcmp(one,two);
}

however, if the buffer is not zero-terminated, I have to use memcmp() which requires a length parameter.

Is there a tidy, efficient way to get the length of the buffer into my comparision function without a global variable?

+2  A: 

Can you pack your buffer pointer + length into a structure and pass a pointer of that structure as void *?

Ates Goral
How does this work with `qsort` or `std::sort`?
J.F. Sebastian
You create a new array of these structures and sort that using a custom comparison function. If you must access the original array (instead of the underlying data), you can maintain pointers or indexes from the new array to the old, which you can use after it is sorted.
Randy Proctor
+3  A: 

Is there a reason you can't null-terminate your buffers?

If not, since you're using C++ you can write your own function object:

 struct MyStrCmp {
    MyStrCmp (int n): length(n) { }
    inline bool operator< (char *lhs, char *rhs) {
       return ::strcmp (lhs, rhs, length);
    }
    int length;
 };
 // ...
 std::sort (myList.begin (), myList.end (), MyStrCmp (STR_LENGTH));
eduffy
beat me by 30 secs :) +1
Evan Teran
Note: std::sort takes iterators **not** containers.
Evan Teran
haha! -- at least you got the std::sort call correct .. forgot about iterators ;)
eduffy
thanks, you're accepted answer since you answered slightly faster than Evan, and Evan has *tons* of points already.I put a working version Evan's comments; pls update your code too?
Will
@Will: aww, rep bias :(. Oh well, no hard feelings, glad you were able to get a working solution.
Evan Teran
+7  A: 

With std::sort, you can use a Functor like this:

struct CompString {
    CompString(int len) : m_Len(len) {}
    bool operator<(const char *a, const char *b) const {
        return std::memcmp(a, b, m_Len);
    }
private:
    int m_Len;
};

Then you can do this:

std::sort(begin(), end(), CompString(4)); // all strings are 4 chars long

EDIT: from the comment suggestions (i guess both strings are in a common buffer?):

struct CompString {
    CompString (const unsigned char* e) : end(e) {}
    bool operator()(const unsigned char *a, const unsigned char *b) const {
        return std::memcmp(a, b, std::min(end - a, end - b)) < 0;
    }
private:
    const unsigned char* const end;
};
Evan Teran
struct CompString { CompString (const unsigned char* e) : end(e) {} bool operator()(const unsigned char *a, const unsigned char *b) const { return (-1 == ::memcmp(a, b, __min(end-a,end-b))); } private: const unsigned char* const end; };
Will
This approach got me started.I just posted some minor corrections. Pls update the code.
Will
I couldn't get the operator< to compile, but with a functor overloading the (), it worked a charm!
Will
for the sake of completeness, I should say that my solution failed to account for when the bytes matched - as far as the end of the buffer, but obviously one string would be longer than the other...
Will
well you implied in the original request that all strings being compared were the same length...
Evan Teran
yeah my bad, its all about BWT, but I didn't want to derail this with a discussion about the best ways to do BWT
Will
A: 

You could functors (give the length to the functor's constructor) or Boost.Lambda (use the length in-place).

J.F. Sebastian
A: 

I'm not clear on what you're asking. But I'll try, assuming that

  • You have a single buffer
  • You have an array of pointers of some kind which has been processed in some way so that some or all of its contents point into the buffer

That is code equivalent to:

char *buf = (char*)malloc(sizeof(char)*bufsize);
for (int i=0; i<bufsize; ++i){
    buf[i] = some_cleverly_chosen_value(i);
}

char *ary[arraysize] = {0};
for(int i=0; i<arraysize; ++i){
   ary[i] = buf + some_clever_function(i);
}

/* ...do the sort here */

Now if you control the allocation of the buffer, you could substitute

char *buf = (char*)malloc(sizeof(char)*(bufsize+1));
buf[bufsize]='\0';

and go ahead using strcmp. This may be possible even if you don't control the filling of the buffer.

If you have to live with a buffer handed you by someone else you can

  1. Use some global storage (which you asked to avoid and good thinking).
  2. Hand the sort function something more complicated than a raw pointer (the address of a struct or class that supports the extra data). For this you need to control the deffinition of ary in the above code.
  3. Use a sort function which supports an extra input. Either sort_r as suggested by Adam, or a home-rolled solution (which I do recommend as an exercise for the student, and don't recommend in real life). In either case the extra data is probably a pointer to the end of the buffer.
dmckee
What?!? The problem statement _is_ unclear. One byte? Two bytes? Up to the end of the buffer? That value of the pointer? All of these are valid understandings of what Will has written.
dmckee
Second sentence: "I want to sort the pointers based upon the bytes in the buffer they point at." OP is asking how to pass the buffer size into the sorting routine's comparison function. Your answer provides nothing helpful in that regard.
Adam Rosenfield
I really don't find that to be clear at all. I'm convince that he means from the pointer to the end because he's not complaining about other answers that use that meaning, but... ANyway, I'll work up an answer on that basis.
dmckee
A: 

memcmp should stop on the first byte that is unequal, so the length should be large, i.e. to-the-end-of-the-buffer. Then the only way it can return zero is if it does go to the end of the buffer.

(BTW, I lean toward merge sort myself. It's stable and well-behaved.)

Mike Dunlavey
Yes, but how does memcmp() know how big the buffer is? That is the problem at hand.
Adam Rosenfield
I guess that's just a plumbing problem. You can't do any worse that writing your own merge sort and passing along any information you care to.
Mike Dunlavey
... and of course qsort_r is also valid, if you have it.
Mike Dunlavey
+3  A: 

With the C function qsort(), no, there is no way to pass the length to your comparison function without using a global variable, which means it can't be done in a thread-safe manner. Some systems have a qsort_r() function (r stands for reentrant) which allows you to pass an extra context parameter, which then gets passed on to your comparison function:

int my_comparison_func(void *context, const void *a, const void *b)
{
    return memcmp(*(const void **)a, *(const void **)b, (size_t)context);
}

qsort_r(data, n, sizeof(void*), (void*)number_of_bytes_to_compare, &my_comparison_func);
Adam Rosenfield
BSD systems, including MacOS X, include qsort_r(); it appears other platforms do not.
Jonathan Leffler