tags:

views:

71

answers:

2
+2  Q: 

Efficient memcspn

Does anyone know of an efficient implementation of a memcspn function?? It should behave like strcspn but look for the span in a memory buffer and not in a null terminated string. The target compiler is visualC++ .

Thanks, Luca

A: 

It would seem pretty difficult to write an inefficient implementation of this function, TBH - the implementation seems pretty straightforward, so I'd suggest writing this yourself if you can't find an implementation in a reasonable timeframe.

Will A
It's actually pretty easy to write an inefficient implementation - for large character sets, e.g. 32-bit `wchar_t`, the inefficient implementation is the only practical implementation and it's `O(nm)`.
R..
@R.: Well, an `O((m + n) log m)` implementation isn't too hard (sort `needle` then binary search within it rather than linear search). And a method similar to yours but using a hash table rather than a simple array would work for 32-bit `wchar_t` and have a similar running time in the average case.
caf
Sorting the "needle" (sorta a misnomer - actually the set of characters) is `O(m)` in space, since you need working space for a copy of it. It's conceivable that this could make the approach infeasible. At least you'd need a fallback case for huge sets.
R..
Of course, a saner design would be to just require the `set` argument to already be sorted... if you can make this requirement.
R..
@all: Very good points - I hadn't considered the search-for-chars being anything more than a few chars - so lack of binary search does make the naive approach much more inefficient.
Will A
+2  A: 

One near-optimal implementation:

size_t memcspan(const unsigned char *buf, size_t len, const unsigned char *set, size_t n)
{
    size_t i;
    char set2[1<<CHAR_BIT] = {0};
    while (n--) set2[set[n]] = 1;
    for (i=0; i<len && !set2[buf[i]]; i++);
    return i;
}

It might be better to use a bit array instead of a byte array for set2, depending on whether arithmetic or a little bit more cache thrashing is more expensive on your machine.

R..
One nit, that is easily repaired: You've confused `strcspn()` with `strpbrk()`. The latter returns a pointer to the first instance from `set`, the former returns the offset to the first instance. Obviously, the same underlying logic gets you either, aside from the edge case of no members of `set` in `buf`.
RBerteig
OK, fixing it...
R..