ansaurus

Question

Algorithm for finding the smallest power of two that's greater or equal to a given value

Answer 1

+5 A:

ceil(log2(value))

ilog2() can be calculated in 3 asm instructions e.g., http://www.asterisk.org/doxygen/1.4/log2comp_8h-source.html

J.F. Sebastian 2008-12-13 08:18:35

That's something I considered, but won't it be slower than my bit-pushing solution?

Boyan 2008-12-13 08:20:23

I think this will be slower than the loop mentioned in the question

Naveen 2008-12-13 08:24:22

There is no slow feature in this but rather, taking the result of log2(value) and rounding it up to the nearest whole number cannot be beaten in efficiency when the log table lookups are all pre-computed

Supernovah 2008-12-13 08:36:41

Log-base-2 generally takes a float as an argument. Are you saying you have a lookup table with an entry for every possible float? I hope not... Of course, the fastest way is a lookup table with 2^32 entries but that's a bit memory-expensive.

paxdiablo 2008-12-13 08:41:28

Yes, quite a bit...

Boyan 2008-12-13 08:47:13

@J.F.: If we throw in some Assembler, your solution really looks better. Thanks for the link!

Boyan 2008-12-13 09:24:02

Answer 2

+6 A:

Your implementation is not naive, it's actually the logical one, except that it's wrong - it returns a negative for numbers greater that 1/2 the maximum integer size.

Assuming you can restrict numbers to the range 0 through 2^30 (for 32-bit ints), it'll work just fine, and a lot faster than any mathematical functions involving logarithms.

Unsigned ints would work better but you'd end up with an infinite loop (for numbers greater than 2^31) since you can never reach 2^32 with the << operator.

paxdiablo 2008-12-13 08:24:14

Yes, the values would be greater than zero and much less than 2^31.

Boyan 2008-12-13 08:27:27

Then what you have is about as fast as it's going to get. I don't doubt there's a boolean algebra solution that'll do it in 2 or three operations, but you'd sacrifice lots of readability for a very small performance gain.

paxdiablo 2008-12-13 08:29:54

I think you might be able to speed it up using a binary search: Initialize result with the (statically) computed middle of the number range and shift left / right. The more steps you precompute on this search tree, the lower the average shift-amount becomes

Tetha 2008-12-13 09:56:06

I just dont know if this really will be faster, because for many ranges the solution up there could be faster despite the higher amount of shifts.

Tetha 2008-12-13 09:56:41

Answer 3

+6 A:

On Intel hardware the BSR instruction is close to what you want - it finds the most-significant-set-bit. If you need to be more precise you can then wonder if the remaining bits are precisely zero or not. I tend to assume that other CPU's will have something like BSR - this is a question you want answered to normalize a number. If your number is more than 32 bits then you would do a scan from your most-significant-DWORD to find the first DWORD with ANY bits set. Edsger Dijkstra would likely remark that the above "algorithms" assume that your computer uses Binary Digits, while from his kind of lofty "algorithmic" perspective you should think about Turing machines or something - obviously I am of the more pragmatic style.

pngaz 2008-12-13 08:39:43

Boyan 2008-12-13 08:46:11

I looked in MSDN and there is a compiler instrinsic called _BitScanReverse() - this is better than assembler because you can't do inline assember in x64 and you don't want to waste a procedure call to an external x64 routine. Assuming you're using MS compilers of course.

pngaz 2008-12-13 08:52:39

pngaz 2008-12-13 08:54:19

There is a bit of cleanup to do since both 4 and 5 will return 2 for the MSB, while the right answer for for those values are 4 and 8 respectively. Nevertheless, I like the BSR solution -- I tend to forget about that instruction.

DocMax 2008-12-13 09:29:26

Boyan 2008-12-13 09:36:19

Ah, my mistake. I had misread the approach. I do not know what you are thinking for the check, but you can avoid a branch by using BSF like: mov eax,x; bsr ecx,eax; bsf edx,eax; sub edx,ecx; adc ecx,0; mov ebx,1; shl ebx,cl. (And assembly on a single line reads like a chess game. :)

DocMax 2008-12-13 09:42:20

pow2roundup() above with the 8 votes solves exactly the prb posed. To create a normalized FP number you must still find the index of the MSB, so BSR does have a mission, but it's work you don't need if ALL you need is roundup - must admit I didn't know of pow2roundup() - clever !

pngaz 2008-12-14 07:25:45

Google Query (( BSR XEON CLOCK )) has 3rd hit "Nearest power of 2 - GameDev.Net Discussion Forums". These guys benched BSR and a Flt-Pt-Trick, evidently such things are constant-time on XEON. _BitScanReverse() is not mentioned - inlining matters as the actual operation is so fast on such machines.

pngaz 2008-12-14 08:46:27

Answer 4

A:

The code below repeatedly strips the lowest bit off until the number is a power of two, then doubles the result unless the number is a power of two to begin with. It has the advantage of running in a time proportional to the number of bits set. Unfortunately, it has the disadvantage of requiring more instructions in almost all cases than either the code in the question or the assembly suggestions. I include it only for completeness.

int nextPow(int x) {
  int y = x
  while (x &= (x^(~x+1))) 
    y = x << 1;
  return y
}

DocMax 2008-12-13 09:34:22

Answer 5

+16 A:

Here's my favorite. Other than the initial check for whether it's invalid (<0, which you could skip if you knew you'd only have >=0 numbers passed in), it has no loops or conditionals, and thus will outperform most other methods. This is similar to erickson's answer, but I think that my decrementing x at the beginning and adding 1 at the end is a little less awkward than his answer (and also avoids the conditional at the end).

/// Round up to next higher power of 2 (return x if it's already a power
/// of 2).
inline int
pow2roundup (int x)
{
    if (x < 0)
        return 0;
    --x;
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    return x+1;
}

Larry Gritz 2008-12-13 10:08:55

Except for minor syntax differences and the additional of that initial check, yours is also nearly identical to the version given in Henry S. Warren, Jr.'s "Hacker's Delight.".

Boojum 2008-12-13 10:22:48

@Boojum: Thanks for mentioning that book. I've checked it out and it's got the solution I need, plus much more!

Boyan 2008-12-13 10:36:09

Maybe this solution will be slower than doing it in Assembler, but it has the advantage of being completely portable.

Boyan 2008-12-13 11:18:05

So was your solution, @Boyan, and I think yours was a little more readable.

paxdiablo 2008-12-13 11:53:57

@Pax: Yes, but this one executes in constant time. And it's not that hard to understand, especially if you run it two-three times through the debugger :)

Boyan 2008-12-13 12:34:48

Wouldn't "if (x <= 0)" be better than "if (x < 0)" ?

Dipstick 2008-12-13 12:53:11

I agree, this is a better version of my approach (which I deleted). +1!

erickson 2008-12-13 15:59:10

Well, I guess it's time to call it a day. Probably can't get more efficient than that solution. Thanks to all!

Boyan 2008-12-13 19:14:51

@Boyan: This solution is not portable e.g., how does it work on x64? (it doesn't)

J.F. Sebastian 2008-12-13 21:45:02

@J.F.: It works with a minor change. Could be done for both architectures with conditional compilation.

Boyan 2008-12-14 05:05:50

A lookup table based approach can be up to twice as fast as this one.

Sparr 2008-12-17 01:45:14

Actually, this algorithm resembles what a multiply instruction does. I wonder if there's a way to make use of that?

Mike Dunlavey 2008-12-17 02:51:12

Answer 6

+2 A:

You don't really say what you mean by "better algorithm" but as the one you present is perfectly clear (if somewhat flawed), I'll assume you are after a more efficient algorithm.

Larry Gritz has given what is probably the most efficient c/c++ algorithm without the overhead of a look up table and it would suffice in most cases (see http://www.hackersdelight.org for similar algorithms).

As mentioned elsewhere most CPUs these days have machine instructions to count the number of leading zeroes (or equivalently return the ms set bit) however their use is non-portable and - in most cases - not worth the effort.

However most compilers have "intrinsic" functions that allow the use of machine instructions but in a more portable way.

Microsoft C++ has _BitScanReverse() and gcc provides __builtin_clz() to do the bulk of the work efficiently.

Dipstick 2008-12-13 13:10:18

Answer 7

+2 A:

An exploration of the possible solutions to closely related problem (that is, rounding down instead of up), many of which are significantly faster than the naive approach, is available on the Bit Twiddling Hacks page, an excellent resource for doing the kinds of optimization you are looking for. The fastest solution is to use a lookup table with 256 entries, that reduces the total operation count to around 7, from an average of 62 (by a similar operation counting methodology) for the naive approach. Adapting those solutions to your problem is a matter of a single comparison and increment.

Sparr 2008-12-17 01:42:35

Thanks for the link, nice stuff there!

Boyan 2008-12-17 07:58:33

Answer 8

+1 A:

I know this is downvote-bait, but if the number is small enough (like 8 or 16-bits) a direct lookup might be fastest.

// fill in the table
unsigned short tab[65536];
unsigned short bit = tab[i];

It might be possible to extend it to 32 bits by first doing the high word and then the low.

//
unsigned long bitHigh = ((unsigned long)tab[(unsigned short)(i >> 16)]) << 16;
unsigned long bitLow = 0;
if (bitHigh == 0){
    bitLow = tab[(unsigned short)(i & 0xffff)];
}
unsigned long answer = bitHigh | bitLow;

It's probably no better that the shift-or methods, but maybe could be extended to larger word sizes.

(Actually, this gives the highest 1-bit. You'd have to shift it left by 1 to get the next higher power of 2.)

Mike Dunlavey 2008-12-17 01:57:07

Using a 256-entry lookup table and using the results for all 4 bytes of a 4-byte word is quite viable. The memory/speed tradeoff to use a 65536 entry table is not great (14% speedup, 25500% memory increase),

Sparr 2008-12-18 21:52:57

@Sparr. You're right. It's hard to beat the shift-or method, but it's interesting to keep an eye out for more ways to skin cats.

Mike Dunlavey 2008-12-18 22:24:23

Once I heard a Dijkstra lecture on fun algorithms. He had a student who did an n-bit rotate by doing a sequence of 3 bit-reversals.

Mike Dunlavey 2008-12-18 22:29:55

Answer 9

+2 A:

In the spirit of Quake II's 0x5f3759df and the Bit Twiddling Hacks' IEEE version - this solution reaches into a double to extract the exponent as a means to calculate floor(lg2(n)). It's a bit faster than the accepted solution and much faster than the Bit Twiddling IEEE version since it avoids floating point math. As coded, it assumes a double is a real*8 IEEE float on a little endian machine.

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d = n-1; 
    return 1 << ((((int*)&d)[1]>>20)-1022); 
}

Edit: Add optimized x86 assembly version with the help of a co-worker. A 4% speed gain but still about 50% slower than a bsr version (6 sec vs 4 on my laptop for n=1..2^31-2).

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d;
    n--;
    __asm {
      fild    n 
      mov     eax,4
      fstp    d 
      mov     ecx, dword ptr d[eax]
      sar     ecx,14h 
      rol     eax,cl 
  }
}

Tony Lee 2009-01-20 19:35:11

Answer 10

A:

My version of the same:

int pwr2Test(size_t x) {
    return (x & (x - 1))? 0 : 1; 
}

size_t pwr2Floor(size_t x) {
    // A lookup table for rounding up 4 bit numbers to
    // the nearest power of 2.
    static const unsigned char pwr2lut[] = {
        0x00, 0x01, 0x02, 0x02,     //  0,  1,  2,  3
        0x04, 0x04, 0x04, 0x04,     //  4,  5,  6,  7
        0x08, 0x08, 0x08, 0x08,     //  8,  9, 10, 11
        0x08, 0x08, 0x08, 0x08      // 12, 13, 14, 15
    };

    size_t pwr2 = 0;                // The return value
    unsigned int i = 0;             // The nybble interator

    for( i = 0; x != 0; ++i ) {     // Iterate through nybbles
        pwr2 = pwr2lut[x & 0x0f];   // rounding up to powers of 2.
        x >>= 4;                    // (i - 1) will contain the
    }                               // highest non-zero nybble index.

    i = i? (i - 1) : i;
    pwr2 <<= (i * 4);
    return pwr2; 
}

size_t pwr2Size(size_t x) {
    if( pwr2Test(x) ) { return x; }
    return pwr2Floor(x) * 2; 
 }

natersoz 2010-05-12 23:47:53

ansaurus

tags:

views:

answers:

Algorithm for finding the smallest power of two that's greater or equal to a given value

related questions