views:

2174

answers:

10

I need to find the smallest power of two that's greater or equal to a given value. So far, I have this:

int value = 3221; // 3221 is just an example, could be any number
int result = 1;

while (result < value) result <<= 1;

It works fine, but feels kind of naive. Is there a better algorithm for that problem?

EDIT. There were some nice Assembler suggestions, so I'm adding those tags to the question.

+5  A: 
ceil(log2(value))

ilog2() can be calculated in 3 asm instructions e.g., http://www.asterisk.org/doxygen/1.4/log2comp_8h-source.html

J.F. Sebastian
That's something I considered, but won't it be slower than my bit-pushing solution?
Boyan
I think this will be slower than the loop mentioned in the question
Naveen
There is no slow feature in this but rather, taking the result of log2(value) and rounding it up to the nearest whole number cannot be beaten in efficiency when the log table lookups are all pre-computed
Supernovah
Log-base-2 generally takes a float as an argument. Are you saying you have a lookup table with an entry for every possible float? I hope not... Of course, the fastest way is a lookup table with 2^32 entries but that's a bit memory-expensive.
paxdiablo
Yes, quite a bit...
Boyan
@J.F.: If we throw in some Assembler, your solution really looks better. Thanks for the link!
Boyan
+6  A: 

Your implementation is not naive, it's actually the logical one, except that it's wrong - it returns a negative for numbers greater that 1/2 the maximum integer size.

Assuming you can restrict numbers to the range 0 through 2^30 (for 32-bit ints), it'll work just fine, and a lot faster than any mathematical functions involving logarithms.

Unsigned ints would work better but you'd end up with an infinite loop (for numbers greater than 2^31) since you can never reach 2^32 with the << operator.

paxdiablo
Yes, the values would be greater than zero and much less than 2^31.
Boyan
Then what you have is about as fast as it's going to get. I don't doubt there's a boolean algebra solution that'll do it in 2 or three operations, but you'd sacrifice lots of readability for a very small performance gain.
paxdiablo
I think you might be able to speed it up using a binary search: Initialize result with the (statically) computed middle of the number range and shift left / right. The more steps you precompute on this search tree, the lower the average shift-amount becomes
Tetha
I just dont know if this really will be faster, because for many ranges the solution up there could be faster despite the higher amount of shifts.
Tetha
+6  A: 

On Intel hardware the BSR instruction is close to what you want - it finds the most-significant-set-bit. If you need to be more precise you can then wonder if the remaining bits are precisely zero or not. I tend to assume that other CPU's will have something like BSR - this is a question you want answered to normalize a number. If your number is more than 32 bits then you would do a scan from your most-significant-DWORD to find the first DWORD with ANY bits set. Edsger Dijkstra would likely remark that the above "algorithms" assume that your computer uses Binary Digits, while from his kind of lofty "algorithmic" perspective you should think about Turing machines or something - obviously I am of the more pragmatic style.

pngaz
Boyan
I looked in MSDN and there is a compiler instrinsic called _BitScanReverse() - this is better than assembler because you can't do inline assember in x64 and you don't want to waste a procedure call to an external x64 routine. Assuming you're using MS compilers of course.
pngaz
pngaz
There is a bit of cleanup to do since both 4 and 5 will return 2 for the MSB, while the right answer for for those values are 4 and 8 respectively. Nevertheless, I like the BSR solution -- I tend to forget about that instruction.
DocMax
Boyan
Ah, my mistake. I had misread the approach. I do not know what you are thinking for the check, but you can avoid a branch by using BSF like: mov eax,x; bsr ecx,eax; bsf edx,eax; sub edx,ecx; adc ecx,0; mov ebx,1; shl ebx,cl. (And assembly on a single line reads like a chess game. :)
DocMax
pow2roundup() above with the 8 votes solves exactly the prb posed. To create a normalized FP number you must still find the index of the MSB, so BSR does have a mission, but it's work you don't need if ALL you need is roundup - must admit I didn't know of pow2roundup() - clever !
pngaz
Google Query (( BSR XEON CLOCK )) has 3rd hit "Nearest power of 2 - GameDev.Net Discussion Forums". These guys benched BSR and a Flt-Pt-Trick, evidently such things are constant-time on XEON. _BitScanReverse() is not mentioned - inlining matters as the actual operation is so fast on such machines.
pngaz
A: 

The code below repeatedly strips the lowest bit off until the number is a power of two, then doubles the result unless the number is a power of two to begin with. It has the advantage of running in a time proportional to the number of bits set. Unfortunately, it has the disadvantage of requiring more instructions in almost all cases than either the code in the question or the assembly suggestions. I include it only for completeness.

int nextPow(int x) {
  int y = x
  while (x &= (x^(~x+1))) 
    y = x << 1;
  return y
}
DocMax
+16  A: 

Here's my favorite. Other than the initial check for whether it's invalid (<0, which you could skip if you knew you'd only have >=0 numbers passed in), it has no loops or conditionals, and thus will outperform most other methods. This is similar to erickson's answer, but I think that my decrementing x at the beginning and adding 1 at the end is a little less awkward than his answer (and also avoids the conditional at the end).

/// Round up to next higher power of 2 (return x if it's already a power
/// of 2).
inline int
pow2roundup (int x)
{
    if (x < 0)
        return 0;
    --x;
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    return x+1;
}
Larry Gritz
Except for minor syntax differences and the additional of that initial check, yours is also nearly identical to the version given in Henry S. Warren, Jr.'s "Hacker's Delight.".
Boojum
@Boojum: Thanks for mentioning that book. I've checked it out and it's got the solution I need, plus much more!
Boyan
Maybe this solution will be slower than doing it in Assembler, but it has the advantage of being completely portable.
Boyan
So was your solution, @Boyan, and I think yours was a little more readable.
paxdiablo
@Pax: Yes, but this one executes in constant time. And it's not that hard to understand, especially if you run it two-three times through the debugger :)
Boyan
Wouldn't "if (x <= 0)" be better than "if (x < 0)" ?
Dipstick
I agree, this is a better version of my approach (which I deleted). +1!
erickson
Well, I guess it's time to call it a day. Probably can't get more efficient than that solution. Thanks to all!
Boyan
@Boyan: This solution is not portable e.g., how does it work on x64? (it doesn't)
J.F. Sebastian
@J.F.: It works with a minor change. Could be done for both architectures with conditional compilation.
Boyan
A lookup table based approach can be up to twice as fast as this one.
Sparr
Actually, this algorithm resembles what a multiply instruction does. I wonder if there's a way to make use of that?
Mike Dunlavey
+2  A: 

You don't really say what you mean by "better algorithm" but as the one you present is perfectly clear (if somewhat flawed), I'll assume you are after a more efficient algorithm.

Larry Gritz has given what is probably the most efficient c/c++ algorithm without the overhead of a look up table and it would suffice in most cases (see http://www.hackersdelight.org for similar algorithms).

As mentioned elsewhere most CPUs these days have machine instructions to count the number of leading zeroes (or equivalently return the ms set bit) however their use is non-portable and - in most cases - not worth the effort.

However most compilers have "intrinsic" functions that allow the use of machine instructions but in a more portable way.

Microsoft C++ has _BitScanReverse() and gcc provides __builtin_clz() to do the bulk of the work efficiently.

Dipstick
+2  A: 

An exploration of the possible solutions to closely related problem (that is, rounding down instead of up), many of which are significantly faster than the naive approach, is available on the Bit Twiddling Hacks page, an excellent resource for doing the kinds of optimization you are looking for. The fastest solution is to use a lookup table with 256 entries, that reduces the total operation count to around 7, from an average of 62 (by a similar operation counting methodology) for the naive approach. Adapting those solutions to your problem is a matter of a single comparison and increment.

Sparr
Thanks for the link, nice stuff there!
Boyan
+1  A: 

I know this is downvote-bait, but if the number is small enough (like 8 or 16-bits) a direct lookup might be fastest.

// fill in the table
unsigned short tab[65536];
unsigned short bit = tab[i];

It might be possible to extend it to 32 bits by first doing the high word and then the low.

//
unsigned long bitHigh = ((unsigned long)tab[(unsigned short)(i >> 16)]) << 16;
unsigned long bitLow = 0;
if (bitHigh == 0){
    bitLow = tab[(unsigned short)(i & 0xffff)];
}
unsigned long answer = bitHigh | bitLow;

It's probably no better that the shift-or methods, but maybe could be extended to larger word sizes.

(Actually, this gives the highest 1-bit. You'd have to shift it left by 1 to get the next higher power of 2.)

Mike Dunlavey
Using a 256-entry lookup table and using the results for all 4 bytes of a 4-byte word is quite viable. The memory/speed tradeoff to use a 65536 entry table is not great (14% speedup, 25500% memory increase),
Sparr
@Sparr. You're right. It's hard to beat the shift-or method, but it's interesting to keep an eye out for more ways to skin cats.
Mike Dunlavey
Once I heard a Dijkstra lecture on fun algorithms. He had a student who did an n-bit rotate by doing a sequence of 3 bit-reversals.
Mike Dunlavey
+2  A: 

In the spirit of Quake II's 0x5f3759df and the Bit Twiddling Hacks' IEEE version - this solution reaches into a double to extract the exponent as a means to calculate floor(lg2(n)). It's a bit faster than the accepted solution and much faster than the Bit Twiddling IEEE version since it avoids floating point math. As coded, it assumes a double is a real*8 IEEE float on a little endian machine.

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d = n-1; 
    return 1 << ((((int*)&d)[1]>>20)-1022); 
}

Edit: Add optimized x86 assembly version with the help of a co-worker. A 4% speed gain but still about 50% slower than a bsr version (6 sec vs 4 on my laptop for n=1..2^31-2).

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d;
    n--;
    __asm {
      fild    n 
      mov     eax,4
      fstp    d 
      mov     ecx, dword ptr d[eax]
      sar     ecx,14h 
      rol     eax,cl 
  }
}
Tony Lee
A: 

My version of the same:

int pwr2Test(size_t x) {
    return (x & (x - 1))? 0 : 1; 
}

size_t pwr2Floor(size_t x) {
    // A lookup table for rounding up 4 bit numbers to
    // the nearest power of 2.
    static const unsigned char pwr2lut[] = {
        0x00, 0x01, 0x02, 0x02,     //  0,  1,  2,  3
        0x04, 0x04, 0x04, 0x04,     //  4,  5,  6,  7
        0x08, 0x08, 0x08, 0x08,     //  8,  9, 10, 11
        0x08, 0x08, 0x08, 0x08      // 12, 13, 14, 15
    };

    size_t pwr2 = 0;                // The return value
    unsigned int i = 0;             // The nybble interator

    for( i = 0; x != 0; ++i ) {     // Iterate through nybbles
        pwr2 = pwr2lut[x & 0x0f];   // rounding up to powers of 2.
        x >>= 4;                    // (i - 1) will contain the
    }                               // highest non-zero nybble index.

    i = i? (i - 1) : i;
    pwr2 <<= (i * 4);
    return pwr2; 
}

size_t pwr2Size(size_t x) {
    if( pwr2Test(x) ) { return x; }
    return pwr2Floor(x) * 2; 
 }
natersoz