views:

212

answers:

10

What's the fastest/cleanest way to find the bit offset of the first unset bit in an array of arbitrary length?

Assume the prototype for your function is something like this size_t first_unset_bit(char unsigned const *buf, size_t bit_count, size_t start_bit); and that it may be called several times in quick succession on the same buffer. If you can give a better prototype, please justify.

If you use assembly any assembly, please provide a x86 sample that will run on a core2 or later. I will award the answer to the solution that provides the best combination of speed and beauty.

Update0

Here's my naive implementation. I have no idea if it's actually right yet, it's not yet used in a live system.

static size_t first_unset_bit(char unsigned const *buf, size_t bit_count, size_t start_bit)
{
    for (; start_bit < bit_count; ++start_bit)
    {
        size_t buf_index = start_bit / CHAR_BIT;
        int bit_index = start_bit % CHAR_BIT;
        if (!((buf[buf_index] >> bit_index) & 1))
            return start_bit;
    }
    return -1;
}
+2  A: 

In x-86 assembly language,

REPE SCAS 0xFFFFFFFF

...is likely to form an important part of the answer!

You have no choice but to examine every bit preceding the first unset bit, so it's just down to how quickly you can do that. Comparing 32 bits at a time is a good start, and once you get down to knowing which WORD contains the first unset bit you can use a combination of shifts/lookuptables to locate the first unset bit in that word.

Roddy
This looks very promising. What's the chance that GCC will generate this anyway in higher levels of optimization? I'll try it out.
Matt Joiner
This is your best chance of getting `repe scasd` with plain C: `wchar_t c[2] = { -1, 0 }; size_t cnt = wcsspn(buf, c) * sizeof(wchar_t);`
R..
Note that the Linux folks discarded a routine using repe scas in favor of plain C: LKML thread: http://lkml.indiana.edu/hypermail/linux/kernel/0803.1/0344.html
@R..: Clever, use of 32bit wchar_t in a library function, very good observation.
Matt Joiner
Yeah, sadly it depends on `wcsspn` being optimized for the case of single-character second argument; I'm not sure how common this is. I only thought of it because my own implementation has this optimization (but not the asm; it's plain C).
R..
REPE SCAS isn't necessarily the best choice on all processors.
peterchen
A: 

The obvious solution is to just loop through from start_bit until you either get to the end of the array or find an unset bit.

Because it can be of arbitrary length you can't just turn it into a number and find the value that way, as it may likely be largr than the size of a double.

James Black
A: 

I'm assuming your buffer is aligned, e.g. a buffer returned by malloc. If not you'll need to scan the unaligned part at the beginning first.

uint32_t *p = (void *)buf;
while (!(*p+1)) p++;
size_t cnt = (unsigned char *)p - buf << CHAR_BIT;
if (*p>=0xFFFF0000)
  if (*p>=0xFFFFFF00)
    if (*p>=0xFFFFFFF0)
      if (*p>=0xFFFFFFFC)
        if (*p>=0xFFFFFFFE) cnt+=31;
        else cnt+=30;
      else
        if (*p>=0xFFFFFFF9) cnt+=29;
        else cnt+=28;
    else
      if (*p>=0xFFFFFFC0)
        if (*p>=0xFFFFFFE0) cnt+=27;
        else cnt+=26;
      else
        if (*p>=0xFFFFFF90) cnt+=25;
        else cnt+=24;
  else
    ...

I'll leave filling in the rest of the binary search to you.

R..
+1  A: 

Linux has what I imagine to be a highly tuned implementation under the name "find_first_zero_bit".

Care to link to it?
Matt Joiner
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.34.y.git;a=blob;f=lib/find_next_bit.c;h=24c59ded47a05d5617e2fe2501077f990de1ec4b;hb=master and http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.34.y.git;a=blob;f=arch/x86/include/asm/bitops.h;h=02b47a603fc88ee76f316989bc0475cb89be5e24;hb=master for ffz
Excellent, thanks. This sets a definite benchmark. To add to that, it's extremely clever and involves some nice architecture specific assembly speedups.
Matt Joiner
+1  A: 

Optimization hint: create lookup table that maps byte value to first unset bit than loop bytes but not bits.

olegz
+1  A: 

Without using any assembly language, but with GCC builtins, and assuming that bit_count is a multiple of the number of bits in a long, something like this should work. I changed your function to take a void* buffer argument to avoid aliasing problems. Totally untested, I might have messed up the math especially in the leading "if (start_bit % LONG_BIT) block.

#include <stddef.h>
#include <limits.h>
#define LONG_BIT (CHAR_BIT * sizeof(unsigned long))

size_t
first_unset_bit(const void *buf, size_t bit_count, size_t start_bit)
{
    size_t long_count = bit_count / LONG_BIT;
    size_t start_long = start_bit / LONG_BIT;

    const unsigned long *lbuf = (const unsigned long *)buf;

    if (start_bit % LONG_BIT)
    {
        size_t offset = start_bit % LONG_BIT;
        unsigned long firstword = lbuf[start_long];
        firstword = ~(firstword | ~((1UL << offset) - 1));
        if (firstword)
            return start_bit - offset + __builtin_clzl(firstword);

        start_long += 1;
    }

    for (size_t i = start_long; i < long_count; i++)
    {
        unsigned long word = lbuf[i];
        if (~word)
            return i*LONG_BIT + __builtin_clzl(~word);
    }
    return bit_count + 1; // not found
}
Zack
Can you explain what aliasing problems you're avoiding?
Matt Joiner
I may have misremembered the rules. I have had compilers object to converting a `char*` to a `long*` and then dereferencing the converted pointer, but just now gcc doesn't seem to mind.
Zack
A: 

As others have mentionned, assembly language may give the best performance. If that is not an option, you may wish to consider the following (untested) routine. It is not quite that for which you asked, but it should be close enough that you can adapt it to your needs.

size_t findFirstNonFFbyte (
    unsigned char const *buf,       /* ptr to buffer in which to search */
    size_t               bufSize,   /* length of the buffer */
    size_t               startHint  /* hint for the starting byte (<= bufSize) */
    ) {
    unsigned char * pBuf = buf + startHint;
    size_t          bytesLeft;

    for (bytesLeft = bufSize - startHint;
         bytesLeft > 0;
         bytesLeft = startHint, pBuf = buf) {
        while ((bytesLeft > 0) && (*pBuf == 0xff)) {
            *pBuf++;
            bytesLeft--;
        }

        if (bytesLeft > 0) {
            return ((int) (pBuf - buf));
        }
    }
    return (-1);
}

To use ...

index = findFirstNonFFbyte (...);
bit_index = index + bitTable[buffer[index]];

Additional Notes:

The above code will check 8 bits at a time. If you know your buffer will be 4-byte aligned and its length an even multiple of 4 bytes, then you can test 32-bits at a time with a little tweaking (don't forget the return value calculation).

If your starting bit is not a hint but an absolute, then you can skip the for-loop.

You will need to provide your own bit lookup table. It should be an array 256 bytes long. Each entry identifies the first clear bit of the byte that indexes that entry. Personal experience tells me that different people will number those bits differently. Some call bit 0 the most signficant bit of the byte; others will call bit 0 the least significant bit of the byte. Whatever style you pick, be sure to be consistent.

Hope this helps.

Sparky
A: 

use the gcc builtin equivalent of Microsoft's _BitScanReverse, I use something like this to find the first free bit(representing block usage) for my memory system:

        __forceinline DWORD __fastcall GetNextFreeBlockIndex(PoolBlock* pPoolBlock)
        {
            DWORD dwIndex;
            DWORD dwOffset = 0;
            DWORD* pUsage = &pPoolBlock->fUsage[0];
            while(dwOffset < MMANAGER_BLOCKS_PER_POOL)
            {
                DWORD dwUsage = *pUsage;
                if(dwUsage != 0xFFFFFFFF && _BitScanForward(&dwIndex,~dwUsage))
                {
                    #if !( MMANAGER_ATOMIC_OPS )
                        pPoolBlock->pSync.Enter();
                    #endif

                    ATOMIC_Write(DWORD,pPoolBlock->dwFreeIndex,dwOffset);
                    ATOMIC_Write(DWORD*,pPoolBlock->pFreeUsage,pUsage);

                    #if !( MMANAGER_ATOMIC_OPS )
                        pPoolBlock->pSync.Leave();
                    #endif

                    return dwIndex + dwOffset;
                }

                pUsage++;
                dwOffset += 32;
            }

            return 0xFFFFFFFF;
        }

        __forceinline DWORD __fastcall GetFreeBlockIndex(PoolBlock* pPoolBlock)
        {
            DWORD dwIndex;
            DWORD dwUsage = *pPoolBlock->pFreeUsage;
            if(dwUsage == 0xFFFFFFFF)
                return GetNextFreeBlockIndex(pPoolBlock);

            if(_BitScanForward(&dwIndex,~dwUsage))
                return dwIndex + pPoolBlock->dwFreeIndex;

            return 0xFFFFFFFF;
        }

excuse the tabbing, this is straight outta some #if/#endif VS code. ofc this code is made just for DWORDS 's, you can just do block_size & 3 to find if there are any odd bytes, copy those odd bytes to a DWORD and scan the DWORD, then cut of any results greater than (block_size & 3) << 3

Necrolis
what's the point of `__fastcall` if you're forcing the code to inline? :P also it's well known that fastcall isn't actually fast
Matt Joiner
Its merely for debugging purposes(aka disabling of inlines in debug mode), I just prefer running through __fastcall funcs in ollydbg as apposed to __stdcall(don't ask, just some weird habit I picked up when I write inlines in MSVC...)
Necrolis
A: 

http://graphics.stanford.edu/~seander/bithacks.html

Merlyn Morgan-Graham
Nice but I don't see a "first zero bit" hack there.
Matt Joiner
@Matt Joiner: Bitwise not operator, and "first one bit" becomes "first zero bit".
Merlyn Morgan-Graham
There is no first one bit either...
Matt Joiner
@Matt Joiner: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious If you want this to no longer be position, but the actual bit, in place, then just bit-shift 1.
Merlyn Morgan-Graham
+1  A: 

Often overlooked, strings.h (yes, that standard header) contains a bunch of functions: ffs, ffsl and the likes, see here for more information. At least with gcc and on x86, this compiles to the corresponding one-cycle instruction, for instance BSFL.

I would thus suggest:

  1. add a sentinel 0xFFFF at the end of your array
  2. divide bit_count by 4 (so your iterating over 4-byte blocks instead of bytes)
  3. use a while loop to find the block with the first set bit

For example:

cursor = start_pos;
while(position = ffsl(buf))
  cursor++;
return (cursor - startpos) * 32 + pos;

(Except you have to test whether youve reached the sentinel, in which case the buffer is blank.)

Although you should take this with a grain of salt because I don't purport to be an assembly expert... you'd basically use little more than 3 cycles for every 32 bits (one incrementation, one comparison, one BSFL instruction), and imagine you can do better using the long version of the function.

Jérémie
Your code seems a bit off, but the presence of this in the C library is great to know. I'll post an update with this in use.
Matt Joiner
Why do you add a sentinel?
Matt Joiner
I add a sentinel at the end of the array so I only have to make one test in the while loop: I only have to test whether "position = ffsl(buf)" is different from zero (which is what ffsl will return if there are not bits set in the buffer). Without the sentinel, I would have, in addition, to make sure I'm not going out of bounds.There is indeed a flagrant error in the code: I am not increasing the address of the buffer. Actually you should get rid of "cursor" and replace all instances of "cursor" by "buf". (Or at least that's the idea.)
Jérémie