views:

467

answers:

9

I want to find the fastest way to get the index of the lowest order bit of a long long. ie:

00101001001000 -> 3

Solutions involving looping and shifting are too slow. ie:

int i;
if(bits == 0ULL) {
  i = 64;
} else {
  for(i = 0;!(bits & 1ULL);i++)
    bits >>= 1;
}

EDIT: Info on usage

The function that uses ffsll can't really reduce its usage, but here it is (simplified of course). It just iterates through the indices and does something with them. This function is perhaps the most widely used function in my whole application despite a lot of caching of its value. It's a legal move generator in my alpha-beta search engine.

while(bits){
  index = ffsll(bits);
  doSomething(index);
  index &= index-1;
}
+6  A: 

The fastest I've found is ffsll(long long) in string.h.

dharga
+1 for using a library function, I wasn't aware of that one.
therefromhere
Note that `ffsll()` counts the bit positions starting from 1, so to match the question you need to do `ffsll(val) - 1`, and a result of -1 means no bits set.
therefromhere
+9  A: 

Intel has specialized instructions for finding either lowest or highest order set bit. BSF seems like what you need. as far as doing it in plain C, maybe the bit twiddling hacks page has what you need.

At the very least you could use a table of nibbles or bytes to speed things up. Something like this (demonstrated for int, but easily changeable to longlong as needed).

/*
0000 - 0
0001 - 1
0010 - 2
0011 - 1
0100 - 3
0101 - 1
0110 - 2
0111 - 1
1000 - 4
1001 - 1
1010 - 2
1011 - 1
1100 - 3
1101 - 1
1110 - 2
1111 - 1
*/

int ffs(int i) {
    int ret = 0;
    int j = 0;
    static const int _ffs_tab[] = 
     { 0, 1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1 };

    while((i != 0) && (ret == 0)) {
     ret = _ffs_tab[i & 0x0f];

     if(ret > 0) {
      break;
     }

     i >>= 4;
     j += 4;

     /* technically the sign bit could stay, so we mask it out to be sure */
     i &= INT_MAX;
    }

    if(ret != 0) {
     ret += j;
    }

    return ret;
}
Evan Teran
+1 for the bit twiddling hacks. You probably want "Count the consecutive zero bits (trailing) on the right with modulus division and lookup" or "Count the consecutive zero bits (trailing) on the right by binary search". The first is constant time.
plinth
for my benchmark ffs runs in 45% of the time this function takes, i believe ffs is a portable wrapper for BSF
dharga
plus this only works for [0,15] i need it for [0,0xFFFFFFFFFFFFFFFF] not really possible
dharga
please post the the function using ffs. perhaps we could rework the function to need to do it less?
Evan Teran
what do you mean [0,15]? BSF works for 32-bits and ffs works for `int` which can be one of a few different widths. If you need it to work for 64-bits then you can just do it multiple times with bit shifting if not found yet (similar to my example with a table which only handles a nibble at a time).
Evan Teran
ahh, this is true, but then we're shifting and looping...probably going to get pretty slow. though a neat solution
dharga
Like I said, perhaps you can avoid doing the `ffs` so often. Can you provide more information on how this is being used?
Evan Teran
+2  A: 

How about implementing a kind of binary search?

Look at the low bits resulting from a bit wise and of a mask value that is all ones in the low half. If that value is zero you know the smallest bit is in the upper half of the number.

other wise cut the thing in half and go again.

Mikeb
tedious, but i believe the fastest way to do it in plain C
dharga
For 64 bit, binary search will take 6 mask-and-compare; the code originally posted on average (assuming random data) only takes two (albeit with a worst case of 64).
Steve Gilham
A: 

How about something like this? It reduces the number of loops considerably.

int shifts = 0;
if ((bits & 0xFFFFFFFFFFFFULL) == 0) // not in bottom 48 bits
{
    shifts = 48;
}
else if ((bits & 0xFFFFFFFFFFULL == 0) // not in bottom 40 bits
{
    shifts = 40;
}
else
// etc

bits >>= shifts;  // do all the shifts at once

// this will loop at most 8 times
for(i = 0;!(bits & 1ULL);i++)
    bits >>= 1;

index = shifts + i;
Dr. Tim
if i was to do this, i'd go with the binary search idea mentioned in another answer
dharga
+2  A: 

You can isolate the lowest set bit with x & (~x + 1); this gives you the lowest bit value, not the index (e.g., if x = 01101000, then the result is 00001000). The fastest way I know of to get from there to an index is probably a switch statement:

switch(x & (~x + 1))
{
  case     0ULL: index = -1; break;
  case     1ULL: index =  0; break;
  case     2ULL: index =  1; break;
  case     4ULL: index =  2; break;
  ...
  case 9223372036854775808ULL: index = 63; break;
}

Ugly, but no looping involved.

John Bode
except that now you introduce rediculous amounts of branching...not efficient either
dharga
Rob
@dharga: yeah, I was assuming a switch being implemented as a jump table, but a quick experiment shows that my platform generates the same code for a switch as an if-else chain, so that idea's out. Next best solution would be a lookup table, but a lookup table for 64-bit values would be a bit ... big. A combination of shifting and an 8 or 16-bit lookup table would work better, but at that point you're probably better off converting to double and using log2.
John Bode
A: 

I wrote two functions, they are returning the same result as ffsll().

int func1( uint64_t n ){
  if( n == 0 ) return 0;
  n ^= n-1;
  int i = 0;
  if( n >= 1ull<<32 ){ n>>=32; i+=32; }
  if( n >= 1ull<<16 ){ n>>=16; i+=16; }
  if( n >= 1ull<< 8 ){ n>>= 8; i+= 8; }
  if( n >= 1ull<< 4 ){ n>>= 4; i+= 4; }
  if( n >= 1ull<< 2 ){ n>>= 2; i+= 2; }
  if( n >= 1ull<< 1 ){         i+= 1; }
  return i+1;
}

int func2( uint64_t n ){
  return n? ((union ieee754_float)((float)(n^(n-1)))).ieee.exponent-126: 0;
}

I don't know which is the fastest: ffsll(), func1() or func2() ?

sambowry
A: 

you can halve the complexity of your algorithm checking first if your number is odd or even. If it is even you have the lowest order bit is the first one.

For odd cases you can implement such a binary search...

Lopoc
that's just checking the first bit then moving on to the rest...
dharga
ok well... you're right, but it's just an operation a it doesn't involve use of loops and shift, it's just a preliminary check you can choose to act or not!
Lopoc
+2  A: 

This might work for 32 bits. Should be easy enough to extend to 64.

// all bits left of lsb become 1, lsb & right become 0
y = x ^ (-x);

// XOR a shifted copy recovers a single 1 in the lsb's location
u = y ^ (y >> 1);

// .. and isolate the bit in log2 of number of bits
i0 = (u & 0xAAAAAAAA) ?  1 : 0;
i1 = (u & 0xCCCCCCCC) ?  2 : 0;
i2 = (u & 0xF0F0F0F0) ?  4 : 0;
i3 = (u & 0xFF00FF00) ?  8 : 0;
i4 = (u & 0xFFFF0000) ? 16 : 0;
index = i4 | i3 | i2 | i1 | i0;

Obviously, if there is some way to have the hardware do it, i.e., if special CPU instructions are available, that is the way to go.

JustJeff
+1  A: 

If using Visual Studio, _BitScanForward:

For gcc, try __builtin_ctz or __builtin_ffs:

As always the generated code should be consulted to ensure the correct instructions are being generated.

brone