ansaurus

Question

What is the fastest/most efficient way to find the last set bit (msb) in an integer in C?

Answer 1

+2 A:

there's a few implementations here:

http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightLinear

(Edit: After rereading your question, I realise that the link above is for finding the rightmost set bit, not leftmost as you require, although without a sense of word size, it's a tricky one to answer)

spender 2009-03-22 23:44:42

That counts zeros on the *right*; the question was about zeros on the left. At least, in a quick skim I don't see it there.

Darius Bacon 2009-03-22 23:49:47

Look at the "Log Base 2" algorithms - as Anderson says in the article: "The log base 2 of an integer is the same as the position of the highest bit set (or most significant bit set, MSB)"

Michael Burr 2009-03-23 00:46:28

Answer 2

A:

Although I would probably only use this method if I absolutely required the best possible performance (e.g. for writing some sort of board game AI involving bitboards), the most efficient solution is to use inline ASM. See the Optimisations section of this blog post for code with an explanation.

Noldorin 2009-03-22 23:46:27

To expand: the standard loop solution (shifting left and checking MSB) is probably the most readable. As in all cases involving bit twiddling, the speed of ASM can't be beaten, though there's no point cluttering your code unless necessary. Hacks are an in-between solution - go one way or the other.

Noldorin 2009-03-23 00:05:05

I'd say taking the logarithm would be a perfectly readable solution (check the generated asm to see if the compiler can optimize it to use this asm instruction)

jalf 2009-03-23 01:22:43

Sometimes the inline ASM solution is slower, depending on the implementation in CPU microcode.

rlbond 2009-03-23 04:39:37

@rlbound: I can hardly believe that, although I may be mistaken. On any modern CPU one would think that it would get translated to a single instruction....

Noldorin 2009-03-23 12:04:53

Answer 3

+11 A:

See "Number of leading zeros algorithms" in Hacker's Delight (neat book).

Darius Bacon 2009-03-22 23:47:51

Hacker's Delight and Anderson's bit twiddling hacks page are definitely the place to go for these kinds of questions.

Michael Burr 2009-03-23 00:48:55

Answer 4

+6 A:

This is sort of like finding a kind of integer log. There are bit-twiddling tricks, but I've made my own tool for this. The goal of course is for speed.

My realization is that the CPU has an automatic bit-detector already, used for integer to float conversion! So use that.

double ff=(double)(v|1);
return ((*(1+(unsigned long *)&ff))>>20)-1023;  // assumes x86 endianness

This version casts the value to a double, then reads off the exponent, which tells you where the bit was. The fancy shift and subtract is to extract the proper parts from the IEEE value.

It's slightly faster to use floats, but a float can only give you the first 24 bit positions because of its smaller precision.

SPWorley 2009-03-22 23:49:25

Won't that cause a load hit store on moving between register sets?

Crashworks 2009-03-23 00:43:59

Isn't there a cost to loading/reading the FPU (as well as synchronizing with it)?

Michael Burr 2009-03-23 00:47:59

Yes. And gcc will do nasty things with code like this with -O2 due to type-aliasing optimizations.

MSN 2009-03-23 00:51:42

casting between integer and floating point can be surprisingly expensive on x86 CPU's

jalf 2009-03-23 01:23:43

Yep, the FPU costs are high. But actual time measurements showed this was faster than all-bit ops or especially any loops. Try it and take the fastest is always the best advice.I have not had a problem with GCC and -O2 with this though.

SPWorley 2009-03-23 02:02:21

You can do this much more portably (and safely) with frexp

Chris Dodd 2009-03-24 23:35:59

Answer 5

+1 A:

Think bitwise operators.

I missunderstood the question the first time. You should produce an int with the leftmost bit set (the others zero). Assuming cmp is set to that value:

position = sizeof(int)*8
while(!(n & cmp)){ 
   n <<=1;
   position--;
}

Vasil 2009-03-22 23:51:08

What do you mean converting to a string? The definition of ffs takes an int and returns an int. Where would the conversion be? And what purpose would the conversion serve if we are looking for bits in a word?

dreamlax 2009-03-23 03:31:51

I didn't know of that function.

Vasil 2009-03-23 03:37:38

Answer 6

+6 A:

Assuming you're on x86 and game for a bit of inline assembler, Intel provides a BSR instruction ("bit scan reverse"). It's fast on some x86s (microcoded on others). From the manual:

Searches the source operand for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand. The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined.

(If you're on PowerPC there's a similar cntlz ("count leading zeros") instruction.)

Example code for gcc:

#include <iostream>

int main (int,char**)
{
  int n=1;
  for (;;++n) {
    int msb;
    asm("bsrl %1,%0" : "=r"(msb) : "r"(n));
    std::cout << n << " : " << msb << std::endl;
  }
  return 0;
}

See also this inline assembler tutorial, which shows (section 9.4) it being considerably faster than looping code.

timday 2009-03-23 00:00:51

Actually this instruction is usually microcoded into a loop and is rather slow.

rlbond 2009-03-23 03:10:54

Which one ? BSR or CNTLZ ? As I read the x86-timing.pdf referenced above, BSR is only slow on the Netburst Pentiums. I know nothing about PowerPC though.

timday 2009-03-23 09:26:07

...OK, on closer inspection make that "BSR is only fast on P3/Pentium-M/Core2 x86s". Slow on Netburst and AMD.

timday 2009-03-23 09:29:23

cntlzw is a pipelined (not microcoded) op on most PPCs. Looks like it's about as fast as an add on Xenon.

Crashworks 2009-03-23 19:28:56

Answer 7

+1 A:

This should be lightning fast:

int msb(unsigned int v) {
  static const int pos[32] = {0, 1, 28, 2, 29, 14, 24, 3,
    30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19,
    16, 7, 26, 12, 18, 6, 11, 5, 10, 9};
  v |= v >> 1;
  v |= v >> 2;
  v |= v >> 4;
  v |= v >> 8;
  v |= v >> 16;
  v = (v >> 1) + 1;
  return pos[(v * 0x077CB531UL) >> 27];
}

Protagonist 2009-03-23 00:32:11

7 bit shifts, 5 or instructions, a multiplty and a potential cache miss. :)Did you benchmark it, or look at the assembler generated? It *could* end up quite slow, depending on how much of it the compiler can eliminate.

jalf 2009-03-23 01:30:04

Hi, just wondering jalf, how do you now the thing with the possible cache miss?

Eduardo 2009-03-23 19:33:25

Arno's answer doesn't work beyond 24 bits, rlbond's answer doesn't even work (it returns an answer where the most significant one bit is turned on and all the other bits are turned off--not the answer desired), and Vasil's code fragment isn't complete.

Protagonist 2009-03-23 19:56:01

I'm new here. I don't get the negative votes guys. I have provided the only answer with source code that actually works.

Protagonist 2009-03-23 19:59:22

The "possible cache miss" is probably due to this code requiring access to its lookup table. If that table is not cached when this is called, there will be a stall while it's fetched. This might make the worst-case performance far worse than the solutions not using a LUT.

unwind 2009-03-24 15:00:32

Answer 8

+1 A:

unsigned int
msb32(register unsigned int x)
{
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
        return(x & ~(x >> 1));
}

1 register, 13 instructions. Believe it or not, this is usually faster than the BSR instruction mentioned above, which operates in linear time. This is logarithmic time.

From http://aggregate.org/MAGIC/#Most%20Significant%201%20Bit

rlbond 2009-03-23 03:21:45

The above code doesn't answer the question. It returns an unsigned integer where the most significant on bit in x remains on and all the other bits are turned off. The question was to return the *position* of the most significant on bit.

Protagonist 2009-03-23 21:55:22

Answer 9

+1 A:

GCC has

 -- Built-in Function: int __builtin_clz (unsigned int x)
     Returns the number of leading 0-bits in X, starting at the most
     significant bit position.  If X is 0, the result is undefined.

 -- Built-in Function: int __builtin_clzl (unsigned long)
     Similar to `__builtin_clz', except the argument type is `unsigned
     long'.

 -- Built-in Function: int __builtin_clzll (unsigned long long)
     Similar to `__builtin_clz', except the argument type is `unsigned
     long long'.

I'd expect them to be translated into something reasonably efficient for your current platform, whether it be one of those fancy bit-twiddling algorithms, or a single instruction.

ephemient 2009-03-23 15:16:34

Answer 10

A:

One approach could be to keep shifting left till the number becomes negative.

Here is the code:

Funct()

{

int number; int count;

while(number > 0)

{

number = number << 1;

count++;

}

printf("It is the no "%d" bit from the left", (count+1));

}

2009-08-22 05:43:28

Answer 11

A:

int a, b = 32769;
a = (b & 0x8000) >> 15;

Pankaj 2009-10-07 05:52:12

ansaurus

tags:

views:

answers:

What is the fastest/most efficient way to find the last set bit (msb) in an integer in C?

related questions