ansaurus

Question

Answer 1

+2 A:

There may be better answers available, particularly if assembler isn't out of the question, but one viable solution would be to use a lookup table. It would have 256 entries, each returning the number of contiguous trailing 1 bits. Apply it to the lowest byte. If it's 8, apply to the next and keep count.

Steven Sudit 2010-03-04 16:09:27

Isn't the worst case of this the same though? If I understand correctly, I'd first check if the current byte has eight trailing bits, then seven, then six, etc. using the lookup table. Or do you mean something else?

IVlad 2010-03-04 16:13:43

Something else! You'd look up the last byte, and if the answer is under 8, you're done. If not, repeat for the next to last byte and keep the total. You do a maximum of 4 lookups for a 32-bit number, with a simple test after each to determine whether to continue.

Steven Sudit 2010-03-04 16:16:02

The worst case here is 4 lookups instead of 32 bitwise operations.

Sparr 2010-03-04 16:16:51

To be clear, what I'm suggesting is still a linear search, but it's byte-by-byte, not bit-by-bit. It's still O(n), but with a smaller k.

Steven Sudit 2010-03-04 16:17:43

Hmm, and technically, you don't need to keep the total. Your index will tell you how many bytes you've scanned. Multiply those by 8 and add the lookup of the stopping byte.

Steven Sudit 2010-03-04 16:19:24

Some of the other answers are interesting and may be slightly faster or use a smaller table. My only concern is that, at a glance, I have no idea if they're correct.

Steven Sudit 2010-03-04 16:22:15

I get it now, basically I look up every byte in the lookup table. I was thinking about something else initially. This does improve things, I'll consider it, thanks.

IVlad 2010-03-04 16:30:33

@Steven Sudit - It isn't O(n), it is O(1) because you never have to do more than four lookups for any possible input (in this example). If you extended the problem to handle numbers of any size, then it is O(log n) because you are not dealing with the number, but the number of bytes needed to represent the number.

Robert Lamb 2010-03-04 16:42:05

@Robert: I meant n as the number of bytes. In this case, n = k, so it's O(1). If we were doing this on an arbitrarily long sequence of bytes, then O(n) would make more sense.

Steven Sudit 2010-03-04 16:58:16

Answer 2

A:

This code counts the number of trailing zero bits, taken from here (there's also a version that depends on the IEEE 32 bit floating point representation, but I wouldn't trust it, and the modulus/division approaches look really slick - also worth a try):

int CountTrailingZeroBits(unsigned int v) // 32 bit
{
    unsigned int c = 32; // c will be the number of zero bits on the right

    static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
    static const unsigned int S[] = {1, 2, 4, 8, 16}; // Our Magic Binary Numbers

    for (int i = 4; i >= 0; --i) // unroll for more speed
    {
        if (v & B[i])
        {
            v <<= S[i];
            c -= S[i];
        }
    }

    if (v)
    {
        c--;
    }

    return c;
}

and then to count trailing ones:

int CountTrailingOneBits(unsigned int v)
{
    return CountTrailingZeroBits(~v);
}

plinth 2010-03-04 16:12:58

Answer 3

+1 A:

A blazingly fast way to find the number of trailing 0's is given in Hacker's Delight.

You could complement your integer (or more generally, word) to find the number of trailing 1's.

Håvard S 2010-03-04 16:13:27

You'd need to complement, not negate, the value.

Fred Larson 2010-03-04 16:17:10

Well spotted, updated.

Håvard S 2010-03-04 16:19:10

Answer 4

+6 A:

The Bit Twiddling Hacks page has a number of algorithms for counting trailing zeros. Any of them can be adapted by simply inverting your number first, and there are probably clever ways to alter the algorithms in place without doing that as well. On a modern CPU with cheap floating point operations the best is probably thus:

unsigned int v=~input;            // find the number of trailing ones in input
int r;                     // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;
if(r==-127) r=32;

Sparr 2010-03-04 16:14:33

Answer 5

A:

http://graphics.stanford.edu/~seander/bithacks.html might give you some inspiration.

Patrick 2010-03-04 16:14:37

Answer 6

+11 A:

Calculate ~i & (i + 1) and use the result as a lookup in a table with 32 entries. 1 means zero 1s, 2 means one 1, 4 means two 1s, and so on, except that 0 means 32 1s.

Ignacio Vazquez-Abrams 2010-03-04 16:16:49

This is yet another solution whose correctness I cannot determine at a glance. Could you link to or include working code, please?

Steven Sudit 2010-03-04 16:25:35

IVlad 2010-03-04 16:36:35

Sparr 2010-03-04 16:45:57

and I guess I'll implement this one too... :)

Sparr 2010-03-04 16:46:20

See answer below to get rid of the tables.

phkahler 2010-03-04 16:52:49

Please do :). I'm not sure how you can use a size 32 lookup table and easily get the count. Binary searching as in the OP's implementation kind of reduces performance.

IVlad 2010-03-04 16:53:35

I think he meant something like a hash/assoc instead of a normal array for the lookup table. In PHP thus: array(1->0,2->1,4->2,8->3,etc)

Sparr 2010-03-04 22:14:13

A flat lookup table would take 4 gigabytes, and a binary tree would do just as well with `(i^i+1)` which is simpler.

Potatoswatter 2010-03-05 04:28:15

@sparr, `^` is xor, and use "@" when replying to a comment.

Potatoswatter 2010-03-05 10:10:38

incorrect comment deleted. I'll use @ when it is automated.

Sparr 2010-03-05 16:31:19

Answer 7

+2 A:

Implementing Steven Sudit's idea...

uint32_t n; // input value
uint8_t o;  // number of trailing one bits in n

uint8_t trailing_ones[256] = {
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 6, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 7, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 6, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4, 
0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 8};

uint8_t t;
do {
  t=trailing_ones[n&255];
  o+=t;
} while(t==8 && (n>>=8))

1 (best) to 4 (worst) (average 1.004) times (1 lookup + 1 comparison + 3 arithmetic operations) minus one arithmetic operation.

Sparr 2010-03-04 16:33:16

That's a pretty fast implementation, and will work in a template for any size int. The one I had in mind would probably have been slower, as it was based on a byte* and a count, which adds flexibility at a cost. If the input type is fixed, the loop could be unrolled for more speed.

Steven Sudit 2010-03-04 21:14:03

Also, in all cases, the final right-shift is unnecessary. Therefore, moving the while into the loop (by changing into into an if/break) would speed it up slightly. Unrolling would be a bigger win, though. I wonder if the unrolling could be done by a template instead of by hand...

Steven Sudit 2010-03-04 21:15:40

moved the shift, short circuiting should take care of it, AND now I save a loop iteration when the answer is a multiple of 8.

Sparr 2010-03-04 22:25:56

Answer 8

+5 A:

phkahler 2010-03-04 16:50:38

Very nice and no lookup table at all. Thanks!

IVlad 2010-03-04 16:54:21

However, is this better than the other solutions? Other than using less memory, the 256 ints lookup table solution executes less operations I think. It's hard to test since they are all very fast anyway, but which one do you think would win on a very slow computer?

IVlad 2010-03-04 17:01:10

It should be faster than any solution with a loop or conditional, but you really need to test to know for sure.

phkahler 2010-03-04 17:59:55

The loop for the lookup solution could be unrolled, but I think you're stuck with the conditional.

Steven Sudit 2010-03-04 21:06:40

faster is very platform dependent. give me an ARM with conditional execution any day. for the uninitiated, almost every arm operation can be conditional. instead of having 16 branch commands (==0, <0, >=0, <=, <, >, >=, etc...) you have 4 bits of the instruction word dedicated to giving those same conditionals to everything. So there is just one branch operation, and you can make it a Branch If Less Than Or Equal with a modifier, but the same modifier can turn the Add operator into in Add If Less Than Or Equal, without any processing overhead.

Sparr 2010-03-05 05:27:58

You can also drop some of the masks since you know certain additions won't overflow. i.e. the last 2 lines are just adding the bytes in b, none of which contain a value over 8. Again - test ;-)

phkahler 2010-03-05 15:17:13

@Sparr: Interesting. I can certainly see how avoiding branches could speed things up.

Steven Sudit 2010-03-06 01:47:28

Answer 9

A:

lsalamon 2010-03-04 17:53:43

I believe the reason you got downvoted is that your solution is essentially the bitwise linear search that the OP already had and was trying to improve upon.

Steven Sudit 2010-03-04 21:17:22

A proof of concept would show who has better performance. Perfection is not the complexity.

lsalamon 2010-03-05 12:11:44

I'm going to take a guess and say that your implementation is slower than Sparr's.

Steven Sudit 2010-03-05 21:53:03

The algorithm is exposed as not working, at least I can not make it work.

lsalamon 2010-03-06 17:37:45

No, the OP didn't say their algorithm didn't work. They just wanted something better than bitwise searching. But all you provided was an example of bitwise searching.

Steven Sudit 2010-03-08 08:19:55

Answer 10

A:

Implementation based on Ignacio Vazquez-Abrams's answer

uint8_t trailing_ones(uint32_t i) {
 return log2(~i & (i + 1));
}

Implementation of log2() is left as an exercise for the reader (see here)

Sparr 2010-03-04 22:12:48

Answer 11

+1 A:

GCC has __builtin_ctz and other compilers have their own intrinsics. Just protect it with an #ifdef:

#ifdef __GNUC__
int trailingones( uint32_t in ) {
    return ~ in == 0? 32 : __builtin_ctz( ~ in );
}
#else
// portable implementation
#endif

On x86, this builtin will compile to one very fast instruction. Other platforms might be somewhat slower, but most have some kind of bit-counting functionality that will beat what you can do with pure C operators.

Potatoswatter 2010-03-05 04:39:16

ansaurus

tags:

views:

answers:

Getting the number of trailing 1 bits

related questions