views:

2120

answers:

11

If I have some integer n, and I want to know the position of the last set bit (that is, if the least significant bit is on the right, I want to know the position of the furthest left bit that is a 1), what is the quickest/most efficient method of finding out?

I know that POSIX supports a ffs() method in strings.h to find the first set bit, but there doesn't seem to be a corresponding fls() method.

Is there some really obvious way of doing this that I'm missing?

What about in cases where you can't use POSIX functions for portability?

Edit: What about a solution that works on both 32 and 64 bit architectures (many of the code listings seem like they'd only work on 32 bit ints).

+2  A: 

there's a few implementations here:

http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightLinear

(Edit: After rereading your question, I realise that the link above is for finding the rightmost set bit, not leftmost as you require, although without a sense of word size, it's a tricky one to answer)

spender
That counts zeros on the *right*; the question was about zeros on the left. At least, in a quick skim I don't see it there.
Darius Bacon
Look at the "Log Base 2" algorithms - as Anderson says in the article: "The log base 2 of an integer is the same as the position of the highest bit set (or most significant bit set, MSB)"
Michael Burr
A: 

Although I would probably only use this method if I absolutely required the best possible performance (e.g. for writing some sort of board game AI involving bitboards), the most efficient solution is to use inline ASM. See the Optimisations section of this blog post for code with an explanation.

Noldorin
To expand: the standard loop solution (shifting left and checking MSB) is probably the most readable. As in all cases involving bit twiddling, the speed of ASM can't be beaten, though there's no point cluttering your code unless necessary. Hacks are an in-between solution - go one way or the other.
Noldorin
I'd say taking the logarithm would be a perfectly readable solution (check the generated asm to see if the compiler can optimize it to use this asm instruction)
jalf
Sometimes the inline ASM solution is slower, depending on the implementation in CPU microcode.
rlbond
@rlbound: I can hardly believe that, although I may be mistaken. On any modern CPU one would think that it would get translated to a single instruction....
Noldorin
+11  A: 

See "Number of leading zeros algorithms" in Hacker's Delight (neat book).

Darius Bacon
Hacker's Delight and Anderson's bit twiddling hacks page are definitely the place to go for these kinds of questions.
Michael Burr
+6  A: 

This is sort of like finding a kind of integer log. There are bit-twiddling tricks, but I've made my own tool for this. The goal of course is for speed.

My realization is that the CPU has an automatic bit-detector already, used for integer to float conversion! So use that.

double ff=(double)(v|1);
return ((*(1+(unsigned long *)&ff))>>20)-1023;  // assumes x86 endianness

This version casts the value to a double, then reads off the exponent, which tells you where the bit was. The fancy shift and subtract is to extract the proper parts from the IEEE value.

It's slightly faster to use floats, but a float can only give you the first 24 bit positions because of its smaller precision.

SPWorley
Won't that cause a load hit store on moving between register sets?
Crashworks
Isn't there a cost to loading/reading the FPU (as well as synchronizing with it)?
Michael Burr
Yes. And gcc will do nasty things with code like this with -O2 due to type-aliasing optimizations.
MSN
casting between integer and floating point can be surprisingly expensive on x86 CPU's
jalf
Yep, the FPU costs are high. But actual time measurements showed this was faster than all-bit ops or especially any loops. Try it and take the fastest is always the best advice.I have not had a problem with GCC and -O2 with this though.
SPWorley
You can do this much more portably (and safely) with frexp
Chris Dodd
+1  A: 

Think bitwise operators.

I missunderstood the question the first time. You should produce an int with the leftmost bit set (the others zero). Assuming cmp is set to that value:

position = sizeof(int)*8
while(!(n & cmp)){ 
   n <<=1;
   position--;
}
Vasil
What do you mean converting to a string? The definition of ffs takes an int and returns an int. Where would the conversion be? And what purpose would the conversion serve if we are looking for bits in a word?
dreamlax
I didn't know of that function.
Vasil
+6  A: 

Assuming you're on x86 and game for a bit of inline assembler, Intel provides a BSR instruction ("bit scan reverse"). It's fast on some x86s (microcoded on others). From the manual:

Searches the source operand for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand. The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined.

(If you're on PowerPC there's a similar cntlz ("count leading zeros") instruction.)

Example code for gcc:

#include <iostream>

int main (int,char**)
{
  int n=1;
  for (;;++n) {
    int msb;
    asm("bsrl %1,%0" : "=r"(msb) : "r"(n));
    std::cout << n << " : " << msb << std::endl;
  }
  return 0;
}

See also this inline assembler tutorial, which shows (section 9.4) it being considerably faster than looping code.

timday
Actually this instruction is usually microcoded into a loop and is rather slow.
rlbond
Which one ? BSR or CNTLZ ? As I read the x86-timing.pdf referenced above, BSR is only slow on the Netburst Pentiums. I know nothing about PowerPC though.
timday
...OK, on closer inspection make that "BSR is only fast on P3/Pentium-M/Core2 x86s". Slow on Netburst and AMD.
timday
cntlzw is a pipelined (not microcoded) op on most PPCs. Looks like it's about as fast as an add on Xenon.
Crashworks
+1  A: 

This should be lightning fast:

int msb(unsigned int v) {
  static const int pos[32] = {0, 1, 28, 2, 29, 14, 24, 3,
    30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19,
    16, 7, 26, 12, 18, 6, 11, 5, 10, 9};
  v |= v >> 1;
  v |= v >> 2;
  v |= v >> 4;
  v |= v >> 8;
  v |= v >> 16;
  v = (v >> 1) + 1;
  return pos[(v * 0x077CB531UL) >> 27];
}
Protagonist
7 bit shifts, 5 or instructions, a multiplty and a potential cache miss. :)Did you benchmark it, or look at the assembler generated? It *could* end up quite slow, depending on how much of it the compiler can eliminate.
jalf
Hi, just wondering jalf, how do you now the thing with the possible cache miss?
Eduardo
Arno's answer doesn't work beyond 24 bits, rlbond's answer doesn't even work (it returns an answer where the most significant one bit is turned on and all the other bits are turned off--not the answer desired), and Vasil's code fragment isn't complete.
Protagonist
I'm new here. I don't get the negative votes guys. I have provided the only answer with source code that actually works.
Protagonist
The "possible cache miss" is probably due to this code requiring access to its lookup table. If that table is not cached when this is called, there will be a stall while it's fetched. This might make the worst-case performance far worse than the solutions not using a LUT.
unwind
+1  A: 
unsigned int
msb32(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return(x & ~(x >> 1));
}

1 register, 13 instructions. Believe it or not, this is usually faster than the BSR instruction mentioned above, which operates in linear time. This is logarithmic time.

From http://aggregate.org/MAGIC/#Most%20Significant%201%20Bit

rlbond
The above code doesn't answer the question. It returns an unsigned integer where the most significant on bit in x remains on and all the other bits are turned off. The question was to return the *position* of the most significant on bit.
Protagonist
+1  A: 

GCC has

 -- Built-in Function: int __builtin_clz (unsigned int x)
     Returns the number of leading 0-bits in X, starting at the most
     significant bit position.  If X is 0, the result is undefined.

 -- Built-in Function: int __builtin_clzl (unsigned long)
     Similar to `__builtin_clz', except the argument type is `unsigned
     long'.

 -- Built-in Function: int __builtin_clzll (unsigned long long)
     Similar to `__builtin_clz', except the argument type is `unsigned
     long long'.

I'd expect them to be translated into something reasonably efficient for your current platform, whether it be one of those fancy bit-twiddling algorithms, or a single instruction.

ephemient
A: 

One approach could be to keep shifting left till the number becomes negative.

Here is the code:

Funct()

{

int number; int count;

while(number > 0)

{

number = number << 1;

count++;

}

printf("It is the no "%d" bit from the left", (count+1));

}

A: 
int a, b = 32769;
a = (b & 0x8000) >> 15;
Pankaj