views:

1172

answers:

7

I have a bit array implementation where the 0th index is the MSB of the first byte in an array, the 8th index is the MSB of the second byte, etc...

What's a fast way to find the first bit that is set in this bit array? All the related solutions I have looked up find the first least significant bit, but I need the first most significant one. So, given 0x00A1, I want 8 (since it's the 9th bit from the left).

+2  A: 

Look up the BSR (Bit scan reverse) x86 asm instruction for the fastest way to do this. From Intel's doc: Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand).

ggiroux
Or, on the PowerPC, cntlwi
Crashworks
A: 

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

Martin Beckett
Heh, I have the exact same URL, `#IntegerLogObvious` included, in my answer.
Arkku
+2  A: 

There are multiple ways to do this, and the relative performance of the different implementations is somewhat machine-dependent (I happen to have benchmarked this to some extent for a similar purpose). On some machines there's even a built-in instruction for this (use one if available and portability can be dealt with).

Check out some implementations here (under “integer log base 2”). If you are using GCC, check out the functions __builtin_clz and __builtin_clzl (which do this for non-zero unsigned ints and unsigned longs, respectively). The “clz” stands for “count leading zeros”, which is yet another way to describe the same problem.

Of course, if your bit array does not fit into a suitable machine word, you need to iterate over words in the array to find the first non-zero word and then perform this calculation only on that word.

Arkku
A: 

Here's a simple, brute force algorithm for an arbitrary-sized array of bytes:

int msb( unsigned char x);  // prototype for function that returns 
                            //  most significant bit set

unsigned char* p;

for (p = arr + num_elements; p != arr;) {
    --p;
    if (*p != 0) break;
}

// p is with pointing to the last byte that has a bit set, or
//  it's pointing to the first byte in the array

if (*p) {
    return ((p - arr) * 8) + msb( *p);
}

// what do you want to return if no bits are set?
return -1;

I'll leave it as a an exercise for the reader to come up with an appropriate msb() function as well as the optimization to work on int or long long sized chinks of data.

Michael Burr
A: 

Um, your tag indicates 32bit but it looks like the values that you're using are 16 bit. If you did mean 32 bit, then I think the answer for 0x00a1 ought to be 24 and not 8.

Assuming that you are looking for the MSB bit index from the left hand side and you know that you will only be dealing with uint32_t's, here's the obvious, simple-minded algorithm:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main()
{
    uint32_t test_value = 0x00a1;
    int i;

    for (i=0; i<32; ++i)
    {
        if (test_value & (0x80000000 >> i))
        {
            printf("i = %d\n", i);
            exit(0);
        }
    }

    return 0;
}
tikiboy
yea but it's too slow =(
Claudiu
+2  A: 

GCC has __builtin_clz that translates to BSR on x86/x64, CLZ on ARM, etc. and emulates the instruction if the hardware does not implement it.
Visual C++ 2005 and up has _BitScanReverse.

andras
ty for the link
Claudiu
A: 

Two best ways I know to do this in pure C:

First linear-search the byte/word array to find the first byte/word that's nonzero, then do an unrolled binary-search of the byte/word you find.

if (b>=0x10)
  if (b>=0x40)
    if (b>=0x80) return 0;
    else return 1;
  else
    if (b>=0x20) return 2;
    else return 3;
else
  if (b>=0x4)
    if (b>=0x8) return 4;
    else return 5;
  else
    if (b>=0x2) return 6;
    else return 7;

3 (BTW that's log2(8)) conditional jumps to get the answer. On modern x86 machines the last one will be optimized to a conditional mov.

Alternatively, use a lookup table to map the byte to the index of the first bit that's set.

A related topic you might want to look up is integer log2 functions. If I recall, ffmpeg has a nice implementation.

Edit: You can actually make the above binary search into a branchless binary search, but I'm not sure if it would be more efficient in this case...

R..