views:

340

answers:

9

What is a fast way to compute the (long int) ceiling(log_2(i)), where the input and output are 64-bit integers? Solutions for signed or unsigned integers are acceptable. I suspect the best way will be a bit-twiddling method similar to those found here, but rather than attempt my own I would like to use something that is already well tested. A general solution will work for all positive values.

For instance, the values for 2,3,4,5,6,7,8 are 1,2,2,3,3,3,3

Edit: So far the best route seems to be to compute the integer/floor log base 2 (the position of the MSB) using any number of fast existing bithacks, and then to add one if the input is not a power of two. The fast bitwise check for powers of two is (n&(n-1)).

Another method might be to try a binary search on exponents e until 1<<e is greater than or equal to the input.

+3  A: 

Finding the log base 2 of an integer (64-bit or any other bit) with integer output is equivalent to finding the most significant bit that is set. Why? Because log base 2 is how many times you can divide the number by 2 to reach 1.

One way to find the MSB that's set is to simply bitshift to the right by 1 each time until you have 0. Another more efficient way is to do some kind of binary search via bitmasks.

The ceil part is easily worked out by checking if any other bits are set other than the MSB.

Brian R. Bondy
Don't you mean bitshift to the right by 1 each time until you have *1*? And since he wants the ceiling, it's that number + 1.
Computer Guru
@Computer Guru: No, it is not "that number + 1". It is "that number + 1 **if** the result isn't exact".
Andreas Rejbrand
Right, and the odds are that it *won't* be exact. So "that number + 1" is more accurate more often than "that number" :)
Computer Guru
Ok, what about this: Calculate the number of "1" bits, and sum 1 if the number of "1" bits is greater than 1.
luiscubal
I know generally how to go about it. The difficulty is figuring out which particular set of bit-wise operations is best. I was hoping there was an existing solution similar to those on the Stanford bithacks page.
+8  A: 

If you can limit yourself to gcc, there are a set of builtin functions which return the number of leading zero bits and can be used to do what you want with a little work:

int __builtin_clz (unsigned int x)
int __builtin_clzl (unsigned long)
int __builtin_clzll (unsigned long long)
ergosys
+1  A: 

OK, what I'm doing here is called "bit-fiddling." You need to understand how numbers are stored in the memory in order to get what's happening.

The number "300" is stored in the memory as: 100101100

And to get its value, the computer does: 2^0*0 + 2^1*0 + 2^2*1 + 2^3*1 .... + 2^8*1

The result is 8.something, the ceiling of which is 9.

We go backwards from (left to right) to determine the highest power, the value of which + 1 is the ceiling (assuming the rest are not zeros, of course).

The code:

int mathfunc(int64_t x)
{
    if(x < 1)
    {
        printf("Invalid input");
        return -1;
    }

    int64_t mask = 0x4000000000000000;
    for(int i = 63; i >= 0; --i)
    {
        if(mask & x)
            return i;
        mask = mask >> 1;
    }

    return -1;
}
Computer Guru
This solution is about the slowest possible and you should write `int64_t` rather than `long int` (which is usually 32bit)...
R..
How is this the "slowest possible" solution? It's the fastest available in C++. long int on most platforms *is* 64, but I agree. Updating now.Give me asm, and I'll use bittest 63 times to find the right answer. A decent amount faster.
Computer Guru
It's a linear search. You can do a binary search or use floating point to do this much faster.
R..
First of all your solution doesn't seem correct, not only the type is not portable, your bit mask is just an `int`. Also, you are close, but this is not exactly the ceiling as requested.And certainly not the fastest available in C or C++. You are just doing a sequential search on the bit positions, in the worst 64 iterations. As Brian indicates in his answer, a binary search for that position can be done. This would give you about log2(64) = 6 iterations.
Jens Gustedt
It *is* the ceiling unless the value is actually a whole number power of two. In that case, it'd be the result -1.... And you can't do a binary search because it's not ordered. If you jump into the middle of the bitmask and find a 0, there's an equal chance of finding a one *on either side of your start position*. Binary search *only works for sorted arrays*. You'd have to first *sort* the bitmask, keep the NewIndex:OldIndex mapping stored somewhere, and *then* use binary search. Good luck with that!In hardware, it's easy enough. 64 comparators in parallel, a priority encoder on the output
Computer Guru
Oh, and how is the bit mask an int? int64_t and int are _not_ the same. Or did the `int i = 63` throw you off? That's the iteration count, and is not used in the masking process. It's easy enough to optimize my code a bit - you can first perform a mathematical operation to narrow down the start bit from 63 to something more reasonable. Even a bunch of jumps.
Computer Guru
@Computer Guru certainly you can do a binary search, if you know how... start by masking with 0xFFFFFFFF00000000, if true then try 0xFFFF000000000000, else try 0x00000000FFFF0000, etc. :)
hobbs
@hobbs: Didn't think of that one. Good point :)
Computer Guru
@hobbs: Goes to show the sad state of CS education... teaching implementations rather than concepts. *sigh*
R..
@hobbs You will probably want to use a precalculated array[65536] to trim your tree height.
Thomas Ahle
@Thomas Ahle that would seem like the smart thing, sure, but I was really just sketching the idea. I have no plans on implementing it. And *conceptually* you can say "stop when the mask is 1 bit wide".
hobbs
@hobbs Right, but if anybody reads this and actually wants to do this, they should know, that when you work with optimizations on this level, there is no way to correctly analyze the performance.I bet the algorithm used by crafty and gnuchess: http://bit.ly/a2LJPW is plenty faster than the binary search.
Thomas Ahle
@R. I wish I learned less theory and more practical, myself...
Will
@Thomas: I honestly believe this is one of those cases that is just plain difficult to optimize at the high level, and which, when done at the lowest levels of asm, is perfectly speedy, and due to in-hardware optimizations is incredibly efficient. If you don't need this to be compiled as 64-bit and you don't care much for compatibility, just wrap it in an `asm{}` tag and do it there.
Computer Guru
@Computer Guru: Sure, for most anybody this is just theory, but still, any chess engine I've read has hyper optimized code for this, even just as fallback for assembly, so it must have some right.
Thomas Ahle
+1  A: 

If you have 80-bit or 128-bit floats available, cast to that type and then read off the exponent bits. This link has details (for integers up to 52 bits) and several other methods:

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogIEEE64Float

Also, check the ffmpeg source. I know they have a very fast algorithm. Even if it's not directly extensible to larger sizes, you can easily do something like if (x>INT32_MAX) return fastlog2(x>>32)+32; else return fastlog2(x);

R..
Checking the source of popular projects is a good idea.
+5  A: 

If you're compiling for 64-bit processors on Windows, I think this should work. _BitScanReverse64 is an intrinsic function.

#include <intrin.h>
__int64 log2ceil( __int64 x )
{
  unsigned long index;
  if ( !_BitScanReverse64( &index, x ) )
     return -1LL; //dummy return value for x==0

  // add 1 if x is NOT a power of 2 (to do the ceil)
  return index + (n&(n-1)?1:0);
}

For 32-bit, you can emulate _BitScanReverse64, with 1 or 2 calls to _BitScanReverse. Check the upper 32-bits of x, ((long*)&x)[1], then the lower 32-bits if needed, ((long*)&x)[0].

Tom Sirgedas
I was thinking something more general than Windows-specific, but so far yours is the closest to the answer. Translated, the idea is that there is a fast bitwise method for checking whether something is a power of two (shown above). We can use this method in conjunction with a register method for determining the position of the MSB to retrieve the answer.
casablanca
A: 

The true fastest solution:

A binary search tree of 63 entries. These are the powers of 2 from 0 to 63. One-time generation function to create the tree. The leafs represent the log base 2 of the powers (basically, the numbers 1-63).

To find the answer, you feed a number into the tree, and navigate to the leaf node greater than the item. If the leaf node is exactly equal, result is the leaf value. Otherwise, result is the leaf value + 1.

Complexity is fixed at O(6).

Computer Guru
+1 really nice idea
Dave
of course you don't really need a tree, just adapt the bisection method for root finding to halve the interval each iteration.
GregS
Dave: Thanks.Greg: Of course. Tree is easier to visualize for the uninitiated, though.
Computer Guru
In practice, I think this kind of branching will be slowish on the CPU (a benchmark would be interesting!). Faster is possible because we can use instructions which operate on 64-bits in parallel. Anyways, your approach is like an unrolled version of "Find the log base 2 of an N-bit integer in O(lg(N)) operations" in the link. And... O(6)=O(1)=O(99999)
Tom Sirgedas
The binary search does not require any branching. It can be done entirely with arithmetic on sign bits/carry flags.
R..
A: 

My apologies for posting this Java code, but I think it translates almost exactly to C. Java doesn't have unsigned types so I just considered positive longs. This is inspired by Computer Guru's answer and the binary search comment in Brian Bondy's answer

public static int log2(long x)
{
    int log2low = 0;
    int log2high = 62;

    if (x <= 0)
    {
        return -1;
    }

    while ((log2high-log2low) > 1)
    {
        /*
         * invariant: 2**log2low <= x <= 2**log2high
         */
        final int log2mid = (log2low + log2high) / 2;
        final long mid = 1L << log2mid;
        if (x > mid)
        {
            log2low = log2mid;
        }
        else if (x < mid)
        {
            log2high = log2mid;
        }
        else // x is a power of 2
        {
            return log2mid;
        }
    }
    return (x == (1<<log2low)) ? log2low : log2high;
}

Note: this can be accelerated by precomputing a table of size 2**k that maps the k high order bits of the argument z to the ceil(log2(z)). In other words, instead of the loop condition (log2high-log2low) > 1 it becomes (log2high-log2low) > k.

GregS
+1  A: 
#include "stdafx.h"
#include "assert.h"

int getpos(unsigned __int64 value)
{
    if (!value)
    {
      return -1; // no bits set
    }
    int pos = 0;
    if (value & (value - 1ULL))
    {
      pos = 1;
    }
    if (value & 0xFFFFFFFF00000000ULL)
    {
      pos += 32;
      value = value >> 32;
    }
    if (value & 0x00000000FFFF0000ULL)
    {
      pos += 16;
      value = value >> 16;
    }
    if (value & 0x000000000000FF00ULL)
    {
      pos += 8;
      value = value >> 8;
    }
    if (value & 0x00000000000000F0ULL)
    {
      pos += 4;
      value = value >> 4;
    }
    if (value & 0x000000000000000CULL)
    {
      pos += 2;
      value = value >> 2;
    }
    if (value & 0x0000000000000002ULL)
    {
      pos += 1;
      value = value >> 1;
    }
    return pos;
}

int _tmain(int argc, _TCHAR* argv[])
{    
    assert(getpos(0ULL) == -1); // None bits set, return -1.
    assert(getpos(1ULL) == 0);
    assert(getpos(2ULL) == 1);
    assert(getpos(3ULL) == 2);
    assert(getpos(4ULL) == 2);
    for (int k = 0; k < 64; ++k)
    {
        int pos = getpos(1ULL << k);
        assert(pos == k);
    }
    for (int k = 0; k < 64; ++k)
    {
        int pos = getpos( (1ULL << k) - 1);
        assert(pos == (k < 2 ? k - 1 : k) );
    }
    for (int k = 0; k < 64; ++k)
    {
        int pos = getpos( (1ULL << k) | 1);
        assert(pos == (k < 1 ? k : k + 1) );
    }
    for (int k = 0; k < 64; ++k)
    {
        int pos = getpos( (1ULL << k) + 1);
        assert(pos == k + 1);
    }
    return 0;
}
rwong
Fast and correct
Lookup table would eliminate the last several `if` clauses, four or more of them depending on the available memory.
To use a 16-bit lookup table: declare global `short getpos_lookup[1 << 16];`. Pre-populate: `int i; for(i=1;i<1<<16;i++) getpos_lookup[i]=log2(i);` Then under the `16` case comment out the `8/4/2/1` cases and put `pos += getpos_lookup[v];`.
Actually i'm not even sure if my version is fast at all. Using a for-loop might be faster than any of the other approaches.
rwong
I ran some simple tests. Your method is appreciably faster than a loop (e.g. `while(value>>=1)pos++;`). Modifying your method to use a lookup table is slightly faster, but I wouldn't call it appreciably faster.For my purposes your method is already fast enough. However, if someone were looking to continue improving on it, I would look into: 1. replacing the MSB detection with a register call (potentially using `#ifdef` statements for portability). 2. employing some heuristics to exploit known distributions of the input (e.g. 90% of incoming numbers are under 1000) 3. use of lookup tables
A: 

You might be able to gain some inspiration from the venerable HAKMEM.

http://www.inwap.com/pdp10/hbaker/hakmem/hacks.html#item160

Ken Smith