tags:

views:

140

answers:

4

Possible Duplicates:
Previous power of 2
Getting the Leftmost Bit

What I want is, suppose there is a number 5 i.e. 101. My answer should be 100. For 9 i.e. 1001, the answer should be 1000

+4  A: 
int input = 5;
std::size_t numBits = 0;
while(input)
{
    input >>= 1;
    numBits++;
}
int output = 1 << (numBits-1);
Billy ONeal
+6  A: 

From Hacker's Delight, a nice branchless solution:

uint32_t flp2 (uint32_t x)
{
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
    return x - (x >> 1);
}

This typically takes 12 instructions. You can do it in fewer if your CPU has a "count leading zeroes" instruction.

Paul R
+1 :) (in 15 chars)
Billy ONeal
It is better if you just vote to close as exact duplicate... instead of adding yet the same answer again.
David Rodríguez - dribeas
@David: I would have done if I'd had any close votes left for today. As it is I posted links to two duplicates in the comments above, assuming that others would vote to close.
Paul R
+7  A: 

You can't ask for the fastest sequence without giving constrains on the machine on which this has to run. For example, some machines support an instruction called "count leading zeroes" or have means to emulate it very quickly. If you can access this instruction (for example with gcc) then you can write:

#include <limits.h>
#include <stdint.h>
uint32_t f(uint32_t x) 
{
    return ((uint64_t)1)<<(32-__builtin_clz(x)-1);
}
int main()
{
    printf("=>%d\n",f(5));
    printf("=>%d\n",f(9));
}

f(x) returns what you want (the least y with x>=y and y=2**n). The compiler will now generate the optimal code sequence for the target machine. For example, when compiling for a default x86_64 architecture, f() looks like this:

    bsrl    %edi, %edi
    movl    $31, %ecx
    movl    $1, %eax
    xorl    $31, %edi
    subl    %edi, %ecx
    salq    %cl, %rax
    ret

You see, no loops here! 7 instructions, no branches.

But if I tell my compiler (gcc-4.5) to optimize for the machine I'm using right now (AMD Phenom-II), then this comes out for f():

    bsrl    %edi, %ecx
    movl    $1, %eax
    salq    %cl, %rax
    ret

This is probably the fastest way to go for this machine.

EDIT: f(0) would have resulted in UB, I've fixed that (and the assembly). Also, uint32_t means that I can write 32 without feeling guilty :-)

Luther Blissett
You're relying on a very platform specific assembly instruction, but bothering to use compatibility for machines with 7 or 9 bit chars? (`CHAR_BIT`)? :P (But +1 though)
Billy ONeal
IIRC, then all recent versions of gcc have __builtin_clz(x), so basically this code should be fine for all machines with gcc support.
Luther Blissett
Well, CHAR_BIT is a nice hint why this 8 is an 8. Just dropping the 8 in there would have made the code less clear. :oP indeed.
Nathon
Rather than `(uint64_t)1`, you could write `1ULL`.
caf
+1  A: 

This is a task related to the bit counting. Check this out.

Using the 2a (which is my favorite of the algorithms; not the fastest) one can come up with this:

int highest_bit_mask (unsigned int n)  {
   while (n)  {
      if (n & (n-1)) {
         n &= (n-1) ;
      } else {
         return n;
      }
   }
   return 0;
}

The magic of n &= (n-1); is that it removes from n the least significant bit. (Corollary: n & (n-1) is false only when n has precisely one bit set.) The algorithm complexity depends on number of bits set in the input.

Check out the link anyway. It is a very amusing and enlightening read which might give you more ideas.

Dummy00001