views:

451

answers:

11

Are there any efficient bitwise operations I can do to get the number of set bits that an integer ends with? For example 1110 = 10112 would be two trailing 1 bits. 810 = 10002 would be 0 trailing 1 bits.

Is there a better algorithm for this than a linear search? I'm implementing a randomized skip list and using random numbers to determine the maximum level of an element when inserting it. I am dealing with 32 bit integers in C++.

Edit: assembler is out of the question, I'm interested in a pure C++ solution.

+2  A: 

There may be better answers available, particularly if assembler isn't out of the question, but one viable solution would be to use a lookup table. It would have 256 entries, each returning the number of contiguous trailing 1 bits. Apply it to the lowest byte. If it's 8, apply to the next and keep count.

Steven Sudit
Isn't the worst case of this the same though? If I understand correctly, I'd first check if the current byte has eight trailing bits, then seven, then six, etc. using the lookup table. Or do you mean something else?
IVlad
Something else! You'd look up the last byte, and if the answer is under 8, you're done. If not, repeat for the next to last byte and keep the total. You do a maximum of 4 lookups for a 32-bit number, with a simple test after each to determine whether to continue.
Steven Sudit
The worst case here is 4 lookups instead of 32 bitwise operations.
Sparr
To be clear, what I'm suggesting is still a linear search, but it's byte-by-byte, not bit-by-bit. It's still O(n), but with a smaller k.
Steven Sudit
Hmm, and technically, you don't need to keep the total. Your index will tell you how many bytes you've scanned. Multiply those by 8 and add the lookup of the stopping byte.
Steven Sudit
Some of the other answers are interesting and may be slightly faster or use a smaller table. My only concern is that, at a glance, I have no idea if they're correct.
Steven Sudit
I get it now, basically I look up every byte in the lookup table. I was thinking about something else initially. This does improve things, I'll consider it, thanks.
IVlad
@Steven Sudit - It isn't O(n), it is O(1) because you never have to do more than four lookups for any possible input (in this example). If you extended the problem to handle numbers of any size, then it is O(log n) because you are not dealing with the number, but the number of bytes needed to represent the number.
Robert Lamb
@Robert: I meant n as the number of bytes. In this case, n = k, so it's O(1). If we were doing this on an arbitrarily long sequence of bytes, then O(n) would make more sense.
Steven Sudit
A: 

This code counts the number of trailing zero bits, taken from here (there's also a version that depends on the IEEE 32 bit floating point representation, but I wouldn't trust it, and the modulus/division approaches look really slick - also worth a try):

int CountTrailingZeroBits(unsigned int v) // 32 bit
{
    unsigned int c = 32; // c will be the number of zero bits on the right

    static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
    static const unsigned int S[] = {1, 2, 4, 8, 16}; // Our Magic Binary Numbers

    for (int i = 4; i >= 0; --i) // unroll for more speed
    {
        if (v & B[i])
        {
            v <<= S[i];
            c -= S[i];
        }
    }

    if (v)
    {
        c--;
    }

    return c;
}

and then to count trailing ones:

int CountTrailingOneBits(unsigned int v)
{
    return CountTrailingZeroBits(~v);
}
plinth
+1  A: 

A blazingly fast way to find the number of trailing 0's is given in Hacker's Delight.

You could complement your integer (or more generally, word) to find the number of trailing 1's.

Håvard S
You'd need to complement, not negate, the value.
Fred Larson
Well spotted, updated.
Håvard S
+6  A: 

The Bit Twiddling Hacks page has a number of algorithms for counting trailing zeros. Any of them can be adapted by simply inverting your number first, and there are probably clever ways to alter the algorithms in place without doing that as well. On a modern CPU with cheap floating point operations the best is probably thus:

unsigned int v=~input;            // find the number of trailing ones in input
int r;                     // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;
if(r==-127) r=32;
Sparr
A: 

http://graphics.stanford.edu/~seander/bithacks.html might give you some inspiration.

Patrick
+11  A: 

Calculate ~i & (i + 1) and use the result as a lookup in a table with 32 entries. 1 means zero 1s, 2 means one 1, 4 means two 1s, and so on, except that 0 means 32 1s.

Ignacio Vazquez-Abrams
This is yet another solution whose correctness I cannot determine at a glance. Could you link to or include working code, please?
Steven Sudit
IVlad
Sparr
and I guess I'll implement this one too... :)
Sparr
See answer below to get rid of the tables.
phkahler
Please do :). I'm not sure how you can use a size 32 lookup table and easily get the count. Binary searching as in the OP's implementation kind of reduces performance.
IVlad
I think he meant something like a hash/assoc instead of a normal array for the lookup table. In PHP thus: array(1->0,2->1,4->2,8->3,etc)
Sparr
A flat lookup table would take 4 gigabytes, and a binary tree would do just as well with `(i^i+1)` which is simpler.
Potatoswatter
@sparr, `^` is xor, and use "@" when replying to a comment.
Potatoswatter
incorrect comment deleted. I'll use @ when it is automated.
Sparr
+2  A: 

Implementing Steven Sudit's idea...

uint32_t n; // input value
uint8_t o;  // number of trailing one bits in n

uint8_t trailing_ones[256] = {
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 6, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 7, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 6, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 8};

uint8_t t;
do {
  t=trailing_ones[n&255];
  o+=t;
} while(t==8 && (n>>=8))

1 (best) to 4 (worst) (average 1.004) times (1 lookup + 1 comparison + 3 arithmetic operations) minus one arithmetic operation.

Sparr
That's a pretty fast implementation, and will work in a template for any size int. The one I had in mind would probably have been slower, as it was based on a byte* and a count, which adds flexibility at a cost. If the input type is fixed, the loop could be unrolled for more speed.
Steven Sudit
Also, in all cases, the final right-shift is unnecessary. Therefore, moving the while into the loop (by changing into into an if/break) would speed it up slightly. Unrolling would be a bigger win, though. I wonder if the unrolling could be done by a template instead of by hand...
Steven Sudit
moved the shift, short circuiting should take care of it, AND now I save a loop iteration when the answer is a multiple of 8.
Sparr
+5  A: 
phkahler
Very nice and no lookup table at all. Thanks!
IVlad
However, is this better than the other solutions? Other than using less memory, the 256 ints lookup table solution executes less operations I think. It's hard to test since they are all very fast anyway, but which one do you think would win on a very slow computer?
IVlad
It should be faster than any solution with a loop or conditional, but you really need to test to know for sure.
phkahler
The loop for the lookup solution could be unrolled, but I think you're stuck with the conditional.
Steven Sudit
faster is very platform dependent. give me an ARM with conditional execution any day. for the uninitiated, almost every arm operation can be conditional. instead of having 16 branch commands (==0, <0, >=0, <=, <, >, >=, etc...) you have 4 bits of the instruction word dedicated to giving those same conditionals to everything. So there is just one branch operation, and you can make it a Branch If Less Than Or Equal with a modifier, but the same modifier can turn the Add operator into in Add If Less Than Or Equal, without any processing overhead.
Sparr
You can also drop some of the masks since you know certain additions won't overflow. i.e. the last 2 lines are just adding the bytes in b, none of which contain a value over 8. Again - test ;-)
phkahler
@Sparr: Interesting. I can certainly see how avoiding branches could speed things up.
Steven Sudit
A: 
lsalamon
I believe the reason you got downvoted is that your solution is essentially the bitwise linear search that the OP already had and was trying to improve upon.
Steven Sudit
A proof of concept would show who has better performance. Perfection is not the complexity.
lsalamon
I'm going to take a guess and say that your implementation is slower than Sparr's.
Steven Sudit
The algorithm is exposed as not working, at least I can not make it work.
lsalamon
No, the OP didn't say their algorithm didn't work. They just wanted something better than bitwise searching. But all you provided was an example of bitwise searching.
Steven Sudit
A: 

Implementation based on Ignacio Vazquez-Abrams's answer

uint8_t trailing_ones(uint32_t i) {
 return log2(~i & (i + 1));
}

Implementation of log2() is left as an exercise for the reader (see here)

Sparr
+1  A: 

GCC has __builtin_ctz and other compilers have their own intrinsics. Just protect it with an #ifdef:

#ifdef __GNUC__
int trailingones( uint32_t in ) {
    return ~ in == 0? 32 : __builtin_ctz( ~ in );
}
#else
// portable implementation
#endif

On x86, this builtin will compile to one very fast instruction. Other platforms might be somewhat slower, but most have some kind of bit-counting functionality that will beat what you can do with pure C operators.

Potatoswatter