views:

1027

answers:

7

Is there a straightforward way to extracting the exponent from a power of 2 using bitwise operations only?

EDIT: Although the question was originally about bitwise operations, the thread is a good read also if you are wondering "What's the fastest way to find X given Y = 2**X in Python?"

I am currently trying to optimize a routine (Rabin-Miller primality test) that reduces an even number N in the forms 2**s * d. I can get the 2**s part by:

two_power_s = N & -N

but I can't find a way to extract just "s" with a bitwise operation. Workarounds I am currently testing without too much satisfaction (they are all pretty much slow) are:

  • using the logarithm function
  • manipulating the binary representation of 2**s (i.e. counting the trailing zeroes)
  • looping on a division by 2 until the result is 1

I am using python, but the answer to this question should be language agnostic, I suppose.

Thank you in advance for your time.

+4  A: 

There is a page with a lot of these types of tricks and hacks. It's written for C, but many of them should work in Python too (though the performance will obviously be different). The bit you want is here and onwards.

You could try this for example:

register unsigned int r = 0; // result of log2(v) will go here
for (i = 4; i >= 0; i--) // unroll for speed...
{
  if (v & b[i])
  {
    v >>= S[i];
    r |= S[i];
  } 
}

That looks like it could be converted to Python quite easily.

Mark Byers
+1 for the link!
Ninefingers
Nice link, thanks. See my own answer to the question (which I will post in a minute) to see how this performed IRL with python...
mac
The compiler will of course unroll this for you, not to mention the `register` keyword is pretty much useless.
GMan
+3  A: 

You can do it in O(lg s) time for arbitrary length integers using a binsearch.

import sys
def floorlg(n):
    if n < 1:
        return -1
    low=0
    high=sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...
    while True:
        mid = (low+high)//2
        i = n >> mid
        if i == 1:
            return mid
        if i == 0:
            high = mid-1
        else:
            low = mid+1

For fixed size integers, a lookup table should be the fastest solution, and probably best overall.

outis
Just a note, as near as I can tell this function hangs on numbers larger than 2^11.
Jordan Reiter
D'oh. Haste makes mistakes. Fixed now.
outis
Isn't this O(lg lg n)? That is, if n = 2**x, then this is O(lg x)?
GregS
@outis - I added the timing for your your code to my answer.
mac
@GregS: I was sloppy. The 'n' in the timing was a different 'n' than in the code. Fixed.
outis
+1  A: 

Convert your power of 2 to hexadecimal (e.g., using '%x' formatting) and count the number of digits, then adjust for the first digit.

Here's a function that computes the bitlength of a Python integer; it'll return one more than the number you actually want (e.g., 2**100 has a bitlength of 101).

def nbits(n, correction = {
        '0': 4, '1': 3, '2': 2, '3': 2,
        '4': 1, '5': 1, '6': 1, '7': 1,
        '8': 0, '9': 0, 'a': 0, 'b': 0,
        'c': 0, 'd': 0, 'e': 0, 'f': 0}):
    """Number of bits in binary representation of the positive integer n,
    or 0 if n == 0.
    """
    if n < 0:
        raise ValueError("The argument to nbits should be nonnegative.")
    hex_n = "%x" % n
    return 4*len(hex_n) - correction[hex_n[0]]

In Python 3.1 (and 2.7, when it appears), there's a bit_length method on integers that does the same thing.

Mark Dickinson
That is almost certainly much less efficient than his original methods.
GregS
Perhaps you should try timing it: you may be surprised. It depends on size of input a bit, but for e.g. 1000-bit integers, on my machine, it's an order of magnitude faster than repeatedly shifting until the result is 1.The key here is that this is Python, which isn't known for its speed, but almost all the actual work happens in the "%x" % n, which runs at C speed. There's a reason this exact code is used in Python's decimal module.I'd actually expect something like "int(round(math.log(n, 2)))" to be faster than either of these when you know the input is a power of 2.
Mark Dickinson
Isn't all your function the equivalent of `len(bin(n)) - 3`, or is there something I am missing?
mac
@mac: Yes, the bin version is equivalent (though I think the above function would actually be `len(bin(n))-2`); it would be worth testing to see which is faster. Also note that bin isn't available in Python <= 2.5.
Mark Dickinson
@GregS: So indeed, the fastest method turns out to be one closely related to the one I proposed. Can I have my downvote back? :)
Mark Dickinson
@Mark Dickinson: Yes, your function returns the same value as len(bin(n)) - 2. On my machine, the built-in bin is noticeably faster, as one might (and probably should) expect. Still, I think your function may be useful for those who are on Python 2.5 or earlier.
John Y
+5  A: 
Gregory Maxwell
Thank you for your post. My library is just something I am developing for fun (actually: to improve my hacking skills in python) so I would say that more than "performance being critical" for me is a matter of "being critical about performance" (i.e. understanding how performance is affected by various factors in python). To this regards, I am posting my findings in a moment. Thank you for your code anyhow... helpful! :)
mac
@Gregory Maxwell - I managed to port your code to python (surprisingly I had to add a `-1` to the result). In python it does not seem to perform very well though. I wonder how the performance of `log_e` compares to `ilog` in C (see the code in my answer). Thanks for your time!
mac
Interesting— well, I suppose it's an example of the impedance mismatch between high and low level programming environments. On my 1.6ghz core2 laptop takes 38,737,220 µs to run the above ilog() on 1-2147483647 and accumulate the result. 18ns per execution ... around 29 clocks. Not bad (floating point log2 plus casts is ~82ns) but using the __builtin_clz() instead takes only 3484684 µs, or 1.62ns per execution... 2.6 clocks, which is consistent with the assembly produced and the cpu's internal parallelism (a three instructon loop using bsrl and two adds).
Gregory Maxwell
+2  A: 

Short answer

As far as python is concerned:

  • The fastest method of all to find the exponent of 2**x is by looking up in a dictionary whose hashes are the powers of 2 (see "hashlookup" in the code)
  • The fastest bitwise method is the one called "*unrolled_bitwise*".
  • Both previous methods have well-defined (but extensible) upper limits. The fastest method without hard-coded upper limits (which scales up as far as python can handle numbers) is "*log_e*".

Preliminary notes

  1. All speed measurements below have been obtained via timeit.Timer.repeat(testn, cycles) where testn was set to 3 and cycles was automatically adjusted by the script to obtain times in the range of seconds (note: there was a bug in this auto-adjusting mechanism that has been fixed on 18/02/2010).
  2. Not all methods can scale, this is why I did not test all functions for the various powers of 2
  3. I did not manage to get some of the proposed methods to work (the function returns a wrong result). I did not yet have tiem to do a step-by-step debugging session: I included the code (commented) just in case somebody spots the mistake by inspection (or want to perform the debug themselves)

Results

func(2**5)

hashlookup:          0.13s     100%
lookup:              0.15s     109%
stringcount:         0.29s     220%
unrolled_bitwise:    0.36s     272%
log_e:               0.60s     450%
bitcounter:          0.64s     479%
log_2:               0.69s     515%
ilog:                0.81s     609%
bitwise:             1.10s     821%
olgn:                1.42s    1065%

func(2**31)

hashlookup:          0.11s     100%
unrolled_bitwise:    0.26s     229%
log_e:               0.30s     268%
stringcount:         0.30s     270%
log_2:               0.34s     301%
ilog:                0.41s     363%
bitwise:             0.87s     778%
olgn:                1.02s     912%
bitcounter:          1.42s    1264%

func(2**128)

hashlookup:     0.01s     100%
stringcount:    0.03s     264%
log_e:          0.04s     315%
log_2:          0.04s     383%
olgn:           0.18s    1585%
bitcounter:     1.41s   12393%

func(2**1024)

log_e:          0.00s     100%
log_2:          0.01s     118%
stringcount:    0.02s     354%
olgn:           0.03s     707%
bitcounter:     1.73s   37695%

Code

import math, sys

def stringcount(v):
    """mac"""    
    return len(bin(v)) - 3

def log_2(v):
    """mac"""    
    return int(round(math.log(v, 2), 0)) # 2**101 generates 100.999999999

def log_e(v):
    """bp on mac"""    
    return int(round(math.log(v)/0.69314718055994529, 0))  # 0.69 == log(2)

def bitcounter(v):
    """John Y on mac"""
    r = 0
    while v > 1 :
        v >>= 1
        r += 1
    return r

def olgn(n) :
    """outis"""
    if n < 1:
        return -1
    low = 0
    high = sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...
    while True:
        mid = (low+high)//2
        i = n >> mid
        if i == 1:
            return mid
        if i == 0:
            high = mid-1
        else:
            low = mid+1

def hashlookup(v):
    """mac on brone -- limit: v < 2**131"""
#    def prepareTable(max_log2=130) :
#        hash_table = {}
#        for p in range(1, max_log2) :
#            hash_table[2**p] = p
#        return hash_table

    global hash_table
    return hash_table[v] 

def lookup(v):
    """brone -- limit: v < 2**11"""
#    def prepareTable(max_log2=10) :
#        log2s_table=[0]*((1<<max_log2)+1)
#        for i in range(max_log2+1):
#            log2s_table[1<<i]=i
#        return tuple(log2s_table)

    global log2s_table
    return log2s_table[v]

def bitwise(v):
    """Mark Byers -- limit: v < 2**32"""
    b = (0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000)
    S = (1, 2, 4, 8, 16)
    r = 0
    for i in range(4, -1, -1) :
        if (v & b[i]) :
            v >>= S[i];
            r |= S[i];
    return r

def unrolled_bitwise(v):
    """x4u on Mark Byers -- limit:   v < 2**33"""
    r = 0;
    if v > 0xffff : 
        v >>= 16
        r = 16;
    if v > 0x00ff :
        v >>=  8
        r += 8;
    if v > 0x000f :
        v >>=  4
        r += 4;
    if v > 0x0003 : 
        v >>=  2
        r += 2;
    return r + (v >> 1)

def ilog(v):
    """Gregory Maxwell - (Original code: B. Terriberry) -- limit: v < 2**32"""
    ret = 1
    m = (not not v & 0xFFFF0000) << 4;
    v >>= m;
    ret |= m;
    m = (not not v & 0xFF00) << 3;
    v >>= m;
    ret |= m;
    m = (not not v & 0xF0) << 2;
    v >>= m;
    ret |= m;
    m = (not not v & 0xC) << 1;
    v >>= m;
    ret |= m;
    ret += (not not v & 0x2);
    return ret - 1;


# following table is equal to "return hashlookup.prepareTable()" 
hash_table = {...} # numbers have been cut out to avoid cluttering the post

# following table is equal to "return lookup.prepareTable()" - cached for speed
log2s_table = (...) # numbers have been cut out to avoid cluttering the post
mac
Every execution successive to the first of `logm` can be sped up by 25%¹ by calculating `log2 = log(2)` outside the function, then replacing `return int(log(v, 2))` with `return int(log(v)/log2)`.---¹Measured through trivial application of `timeit.timeit()`
badp
@bp - Nice! I changed the code, re-ran the tests, and edited my answer accordingly.
mac
@mac: On my machine, your counter function isn't nearly as fast as logm. Also, your counter function can be improved slightly by starting r at 0 and looping while v > 1 and simply returning when the loop is done (this gets rid of the if inside the loop).
John Y
@ John Y - I modified and re-timed the function according to your comment (thank you). As for the counter having a significantly different performance on your machine, my only hypothesis (and I wish to underline it is **just an hypothesis** I am not able to verify) is that it could depend from either having a different python version (I am using python 2.6.4 compiled with GCC 4.4.1 on a linux box) or from having a processor with a different set of instructions (I am on a Core2 T7400 @ 2.16 GHz). Anybody else reading this and wishing to share their performance?
mac
@mac: For bitwise, you've mistranslated the for loop from C. Notice that i >= 0, so you need range(4, -1, -1). Of course, with such a small range (and for a slight performance enhancement), you might as well just use the literal (4, 3, 2, 1, 0). Correctly translated, bitwise works for 32-bit values.
John Y
John Y
@mac: Now your published performance of bitcounter (which had been called counter) is more in line with what I was seeing. In your original timings, bitcounter was competitive with what is now called log_e. I suspect this is because you originally allowed very small values to be passed to the functions being tested, so bitcounter hardly had to loop at all.
John Y
@John Y - Thank you John for having looked into this. I now performed the tests on the two "fixed" functions, and I also manage to port correctly to python the one proposed by Gregory Maxwell. Alas, the previous timing (but not the very initial ones) were affected by a bug I introduced in my timing utility. Should be fixed now (see my comment to brone's answer).
mac
+1  A: 

It seems like the range is known. Let's assume it goes up to 1<<20, just to make it more interesting:

max_log2=20

So make a list that (in effect) maps an integer to its base 2 logarithm. The following will do the trick:

log2s_table=[0]*((1<<max_log2)+1)
for i in range(max_log2+1):
    log2s_table[1<<i]=i

(This doesn't do anything useful for numbers that aren't powers of two; the problem statement suggests they don't need to be handled. Would be easy enough to fix that though.)

The function to get the logarithm is very simple, and could easily be inlined:

def table(v):
    return log2s_table[v]

I can't guarantee that the test code I wrote is exactly the same as the one being used to get the example timings, but this is rather quicker than the stringcount code:

stringcount: 0.43 s.
table: 0.16 s.

Since all the values in the table are less than 256, I wondered whether using a string instead of a list would be quicker, or maybe an array.array of bytes, but no dice:

string: 0.25 s.
arr: 0.21 s.

Using a dict to do the lookup is another possibility, taking advantage of the way only powers of two are being checked:

log2s_map=dict([(1<<x,x) for x in range(max_log2+1)])

def map(v):
    return log2s_map[v]

The results for this weren't as good, though:

map: 0.20 s.

And just for fun one could also use the hex method on float objects to get a string that includes (as its last part) the base 2 exponent of the number. This is a bit slow to extract in general, but if the exponent is only ever going to be one digit it could be done straightforwardly enough:

def floathex(v):
    return ord(float(v).hex()[-1])-48

This is purely for entertainment value though as it was uncompetetive -- though, amazingly, still quicker than the bitwise approach.

So it looks like using a list is the way to go.

(This approach won't scale indefinitely, due to limited memory, but to make up for that the execution speed won't depend on max_log2, or the input values, in any way that you'll notice when running python code. Regarding the memory consumption, if I remember my python internals correctly, the table will take up about (1<<max_log2)*4 bytes, because the contents are all small integers that the interpreter will intern automatically. SO, when max_log2 is 20, that's about 4MB.)

brone
@brone - Thank you for this. I tested your function but it does not seem to perform very well, at least on my machine (see timings in my re-written answer). Did I misunderstood anything?
mac
Your table function does a LOT more work than the one presented here -- there's a function definition in there! (Same for the hashlookup; use dis.dis to see the difference between having it in there and taking it out.) The table preparation is deliberately global so that the function does nothing except for the table lookup. By adding the function definition in there, the execution time increased, though it was still quicker than (say) len(bin(v)) or math.log(v,2) (this is what I would expect). Maybe if you post the testing code this conundrum could be solved.
brone
@brone - You were right in all you wrote. 1) The function definition [now commented out] had a ~25% on each of the two functions 2) The lookup approach - at least when upper limits are known and the time to generate the look-up table is not considered or is spread across many cycles - is the best performing 3) There was a mistake in my timing utility that I introduced with the modifications done on the 16/2. Should be fixed now. Here's the "core" of the timing utility, should you wish to check it out: http://pastebin.com/f7851ef6d Thanks a lot for your time and input!
mac
+1  A: 

This is actually a comment to the performance test posted by mac. I post this as an answer to have proper code formatting and indenting

mac, could you try a unrolled implementation of the bitseach suggested by Mark Byers? Maybe it's just the array access that slows it down. In theory this approach should be faster than the others.

It would look something like this, although I'm not sure whether the formatting is right for python but I guess you can see what it is supposed to do.

def bitwise(v):
    r = 0;
    if( v > 0xffff ) : v >>= 16; r = 16;
    if( v > 0x00ff ) : v >>=  8; r += 8;
    if( v > 0x000f ) : v >>=  4; r += 4;
    if( v > 0x0003 ) : v >>=  2; r += 2;
    return r + ( v >> 1 );

If python shares java's lack of unsingned integers it would need to be something like this:

def bitwise(v):
    r = 0;
    if( v & 0xffff0000 ) : v >>>= 16; r = 16;
    if( v > 0x00ff ) : v >>=  8; r += 8;
    if( v > 0x000f ) : v >>=  4; r += 4;
    if( v > 0x0003 ) : v >>=  2; r += 2;
    return r + ( v >> 1 );
x4u
@x4u - There was a typo in the code (that I took the freedom to fix - thank's to John Y for pointing that out). Timing results now in the chosen answer. BTW: your method turned to be the second most efficient one! :)
mac
Thank you for fixing the typo and also John Y for spotting it. It's intresting but no real surprise that the hash lookup turned out fastest. The advantage of the bitwise approach is that it performs a true log2 for all numbers in the supported range (not just for 2**x) whereas the hash lookup would require an excessive amount of memory to do this.
x4u