ansaurus

Question

What is fastest method to calculate a number having only bit set which is the most significant digit set in another number?

Answer 1

+4 A:

int input = 5;
std::size_t numBits = 0;
while(input)
{
    input >>= 1;
    numBits++;
}
int output = 1 << (numBits-1);

Billy ONeal 2010-08-04 15:27:43

Answer 2

+6 A:

From Hacker's Delight, a nice branchless solution:

uint32_t flp2 (uint32_t x)
{
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
    return x - (x >> 1);
}

This typically takes 12 instructions. You can do it in fewer if your CPU has a "count leading zeroes" instruction.

Paul R 2010-08-04 15:32:46

+1 :) (in 15 chars)

Billy ONeal 2010-08-04 15:35:00

It is better if you just vote to close as exact duplicate... instead of adding yet the same answer again.

David Rodríguez - dribeas 2010-08-04 15:54:28

@David: I would have done if I'd had any close votes left for today. As it is I posted links to two duplicates in the comments above, assuming that others would vote to close.

Paul R 2010-08-04 16:02:34

Answer 3

+7 A:

You can't ask for the fastest sequence without giving constrains on the machine on which this has to run. For example, some machines support an instruction called "count leading zeroes" or have means to emulate it very quickly. If you can access this instruction (for example with gcc) then you can write:

#include <limits.h>
#include <stdint.h>
uint32_t f(uint32_t x) 
{
    return ((uint64_t)1)<<(32-__builtin_clz(x)-1);
}
int main()
{
    printf("=>%d\n",f(5));
    printf("=>%d\n",f(9));
}

f(x) returns what you want (the least y with x>=y and y=2**n). The compiler will now generate the optimal code sequence for the target machine. For example, when compiling for a default x86_64 architecture, f() looks like this:

    bsrl    %edi, %edi
    movl    $31, %ecx
    movl    $1, %eax
    xorl    $31, %edi
    subl    %edi, %ecx
    salq    %cl, %rax
    ret

You see, no loops here! 7 instructions, no branches.

But if I tell my compiler (gcc-4.5) to optimize for the machine I'm using right now (AMD Phenom-II), then this comes out for f():

    bsrl    %edi, %ecx
    movl    $1, %eax
    salq    %cl, %rax
    ret

This is probably the fastest way to go for this machine.

EDIT: f(0) would have resulted in UB, I've fixed that (and the assembly). Also, uint32_t means that I can write 32 without feeling guilty :-)

Luther Blissett 2010-08-04 15:42:03

You're relying on a very platform specific assembly instruction, but bothering to use compatibility for machines with 7 or 9 bit chars? (`CHAR_BIT`)? :P (But +1 though)

Billy ONeal 2010-08-04 15:47:19

IIRC, then all recent versions of gcc have __builtin_clz(x), so basically this code should be fine for all machines with gcc support.

Luther Blissett 2010-08-04 15:52:42

Well, CHAR_BIT is a nice hint why this 8 is an 8. Just dropping the 8 in there would have made the code less clear. :oP indeed.

Nathon 2010-08-04 15:54:21

Rather than `(uint64_t)1`, you could write `1ULL`.

caf 2010-08-05 00:16:59

Answer 4

+1 A:

This is a task related to the bit counting. Check this out.

Using the 2a (which is my favorite of the algorithms; not the fastest) one can come up with this:

int highest_bit_mask (unsigned int n)  {
   while (n)  {
      if (n & (n-1)) {
         n &= (n-1) ;
      } else {
         return n;
      }
   }
   return 0;
}

The magic of n &= (n-1); is that it removes from n the least significant bit. (Corollary: n & (n-1) is false only when n has precisely one bit set.) The algorithm complexity depends on number of bits set in the input.

Check out the link anyway. It is a very amusing and enlightening read which might give you more ideas.

Dummy00001 2010-08-04 16:09:13

ansaurus

tags:

views:

answers:

What is fastest method to calculate a number having only bit set which is the most significant digit set in another number?

related questions