ansaurus

Question

Answer 1

+6 A:

The fastest I've found is ffsll(long long) in string.h.

dharga 2009-09-25 15:33:03

+1 for using a library function, I wasn't aware of that one.

therefromhere 2009-09-26 15:10:22

Note that `ffsll()` counts the bit positions starting from 1, so to match the question you need to do `ffsll(val) - 1`, and a result of -1 means no bits set.

therefromhere 2009-09-26 15:15:42

Answer 2

+9 A:

Intel has specialized instructions for finding either lowest or highest order set bit. BSF seems like what you need. as far as doing it in plain C, maybe the bit twiddling hacks page has what you need.

At the very least you could use a table of nibbles or bytes to speed things up. Something like this (demonstrated for int, but easily changeable to longlong as needed).

/*
0000 - 0
0001 - 1
0010 - 2
0011 - 1
0100 - 3
0101 - 1
0110 - 2
0111 - 1
1000 - 4
1001 - 1
1010 - 2
1011 - 1
1100 - 3
1101 - 1
1110 - 2
1111 - 1
*/

int ffs(int i) {
    int ret = 0;
    int j = 0;
    static const int _ffs_tab[] = 
     { 0, 1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1 };

    while((i != 0) && (ret == 0)) {
     ret = _ffs_tab[i & 0x0f];

     if(ret > 0) {
      break;
     }

     i >>= 4;
     j += 4;

     /* technically the sign bit could stay, so we mask it out to be sure */
     i &= INT_MAX;
    }

    if(ret != 0) {
     ret += j;
    }

    return ret;
}

Evan Teran 2009-09-25 15:38:42

+1 for the bit twiddling hacks. You probably want "Count the consecutive zero bits (trailing) on the right with modulus division and lookup" or "Count the consecutive zero bits (trailing) on the right by binary search". The first is constant time.

plinth 2009-09-25 15:44:40

for my benchmark ffs runs in 45% of the time this function takes, i believe ffs is a portable wrapper for BSF

dharga 2009-09-25 16:04:36

plus this only works for [0,15] i need it for [0,0xFFFFFFFFFFFFFFFF] not really possible

dharga 2009-09-25 16:09:16

please post the the function using ffs. perhaps we could rework the function to need to do it less?

Evan Teran 2009-09-25 16:09:47

what do you mean [0,15]? BSF works for 32-bits and ffs works for `int` which can be one of a few different widths. If you need it to work for 64-bits then you can just do it multiple times with bit shifting if not found yet (similar to my example with a table which only handles a nibble at a time).

Evan Teran 2009-09-25 16:13:09

ahh, this is true, but then we're shifting and looping...probably going to get pretty slow. though a neat solution

dharga 2009-09-25 16:18:02

Like I said, perhaps you can avoid doing the `ffs` so often. Can you provide more information on how this is being used?

Evan Teran 2009-09-25 16:19:56

Answer 3

+2 A:

How about implementing a kind of binary search?

Look at the low bits resulting from a bit wise and of a mask value that is all ones in the low half. If that value is zero you know the smallest bit is in the upper half of the number.

other wise cut the thing in half and go again.

Mikeb 2009-09-25 15:43:15

tedious, but i believe the fastest way to do it in plain C

dharga 2009-09-25 16:08:07

For 64 bit, binary search will take 6 mask-and-compare; the code originally posted on average (assuming random data) only takes two (albeit with a worst case of 64).

Steve Gilham 2009-09-25 23:09:31

Answer 4

A:

How about something like this? It reduces the number of loops considerably.

int shifts = 0;
if ((bits & 0xFFFFFFFFFFFFULL) == 0) // not in bottom 48 bits
{
    shifts = 48;
}
else if ((bits & 0xFFFFFFFFFFULL == 0) // not in bottom 40 bits
{
    shifts = 40;
}
else
// etc

bits >>= shifts;  // do all the shifts at once

// this will loop at most 8 times
for(i = 0;!(bits & 1ULL);i++)
    bits >>= 1;

index = shifts + i;

Dr. Tim 2009-09-25 15:53:19

if i was to do this, i'd go with the binary search idea mentioned in another answer

dharga 2009-09-25 19:56:34

Answer 5

+2 A:

You can isolate the lowest set bit with x & (~x + 1); this gives you the lowest bit value, not the index (e.g., if x = 01101000, then the result is 00001000). The fastest way I know of to get from there to an index is probably a switch statement:

switch(x & (~x + 1))
{
  case     0ULL: index = -1; break;
  case     1ULL: index =  0; break;
  case     2ULL: index =  1; break;
  case     4ULL: index =  2; break;
  ...
  case 9223372036854775808ULL: index = 63; break;
}

Ugly, but no looping involved.

John Bode 2009-09-25 16:35:43

except that now you introduce rediculous amounts of branching...not efficient either

dharga 2009-09-26 12:58:22

Rob 2009-09-26 15:43:02

@dharga: yeah, I was assuming a switch being implemented as a jump table, but a quick experiment shows that my platform generates the same code for a switch as an if-else chain, so that idea's out. Next best solution would be a lookup table, but a lookup table for 64-bit values would be a bit ... big. A combination of shifting and an 8 or 16-bit lookup table would work better, but at that point you're probably better off converting to double and using log2.

John Bode 2009-09-26 16:52:30

Answer 6

A:

I wrote two functions, they are returning the same result as ffsll().

int func1( uint64_t n ){
  if( n == 0 ) return 0;
  n ^= n-1;
  int i = 0;
  if( n >= 1ull<<32 ){ n>>=32; i+=32; }
  if( n >= 1ull<<16 ){ n>>=16; i+=16; }
  if( n >= 1ull<< 8 ){ n>>= 8; i+= 8; }
  if( n >= 1ull<< 4 ){ n>>= 4; i+= 4; }
  if( n >= 1ull<< 2 ){ n>>= 2; i+= 2; }
  if( n >= 1ull<< 1 ){         i+= 1; }
  return i+1;
}

int func2( uint64_t n ){
  return n? ((union ieee754_float)((float)(n^(n-1)))).ieee.exponent-126: 0;
}

I don't know which is the fastest: ffsll(), func1() or func2() ?

sambowry 2009-09-25 23:04:45

Answer 7

A:

you can halve the complexity of your algorithm checking first if your number is odd or even. If it is even you have the lowest order bit is the first one.

For odd cases you can implement such a binary search...

Lopoc 2009-09-26 00:28:30

that's just checking the first bit then moving on to the rest...

dharga 2009-09-26 12:56:20

ok well... you're right, but it's just an operation a it doesn't involve use of loops and shift, it's just a preliminary check you can choose to act or not!

Lopoc 2009-09-26 13:49:17

Answer 8

+2 A:

This might work for 32 bits. Should be easy enough to extend to 64.

// all bits left of lsb become 1, lsb & right become 0
y = x ^ (-x);

// XOR a shifted copy recovers a single 1 in the lsb's location
u = y ^ (y >> 1);

// .. and isolate the bit in log2 of number of bits
i0 = (u & 0xAAAAAAAA) ?  1 : 0;
i1 = (u & 0xCCCCCCCC) ?  2 : 0;
i2 = (u & 0xF0F0F0F0) ?  4 : 0;
i3 = (u & 0xFF00FF00) ?  8 : 0;
i4 = (u & 0xFFFF0000) ? 16 : 0;
index = i4 | i3 | i2 | i1 | i0;

Obviously, if there is some way to have the hardware do it, i.e., if special CPU instructions are available, that is the way to go.

JustJeff 2009-09-26 14:37:12

Answer 9

+1 A:

If using Visual Studio, _BitScanForward:

http://msdn.microsoft.com/en-us/library/wfd9z0bb(VS.80).aspx (_BitScanForward)

For gcc, try __builtin_ctz or __builtin_ffs:

http://developer.apple.com/mac/library/documentation/DeveloperTools/gcc-4.0.1/gcc/Other-Builtins.html (random gcc intrinsics documentation)

As always the generated code should be consulted to ensure the correct instructions are being generated.

brone 2009-09-26 15:26:37

ansaurus

tags:

views:

answers:

Index of lowest order bit

related questions