tags:

views:

1600

answers:

5

I'm using Java and I'm coding a chess engine.

I'm trying to find the index of the first 1 bit and the index of the last 1 bit in a byte.

I'm currently using Long.numberOfTrailingZeros() (or something like that) in Java, and would like to emulate that functionality, except with bytes.

Would it be something like:

byte b = 0b011000101;
int firstOneBit = bitCount ((b & -b) - 1);

If so, how would I implement bitCount relatively efficiently. I don't mind good explainations, please don't just give me code.

+2  A: 

use a lookup tabel with 256 entries. to create it:

unsigned int bitcount ( unsigned int i ) {
unsigned int r = 0;
while ( i ) { r+=i&1; i>>=1; } /* bit shift is >>> in java afair */
return r; 
}

this of course does not need to be fast as you do it at most 256 times to init your tabel.

starmole
Interesting, using a lookup table to precalculate a lookup table! However, it still doesn't answer my question, because I'm not hardcoding 256 entries of a lookup table by hand.
And, yet, you marked this as the answer to the question. Hmmm...
RobH
It's because I'm stupid and didn't understand his answer at first.
+6  A: 

Anybody who likes these sorts of bit-manipulation questions will enjoy Hank Warren's superb book Hacker's Delight where he shows there are more tricks to bit functions than just lookup tables.

Lookup tables have their place---but sometimes you can do way cool stuff with just a few instructions.

Norman Ramsey
I agree completely. I wish I could bump this more. That's an amazing book.
Ken Paul
Yeah, I heard the java libraries use the tricks from that book.Sadly, I stopped trying to do this with Java, since it's byte support is horrible (it converts things to ints, so you have to make sure you're using only the byte!), and it uses signed bytes.
A: 

The correct answer is that most all processors have some special instructions to do this sort of thing (leading zeros, trailing zeros, number of ones, etc). x86 has bsf/bsr, powerpc has clz, and so on. Hopefully Integer.numberOfTrailingZeros is smart enough to use these, but that's probably the only way that has a chance of using this sort of platform-specific function in Java (if it even uses them).

The Aggregate Magic Algorithms is another place with some approaches to this sort of problem, ranging from the obvious (lookup tables), to some rather clever SWAR approaches. But I suspect they all lose to Integer(x).numberOfTrailingZeros() if the java runtime is smart about the latter; it ought to be possible to optimize out the boxing and use a platform-specific technique for numberOfTrailingZeros, and if it does both that'll win.

Just for completeness, the other classic archive of brilliant bit-whacking is the old MIT HAKMEM collection (there's also a semi-modernized C version if your PDP-6/10 assembler skills have gotten rusty).

puetzk
A: 

If you assume that Long.numberOfTrailingZeros is fast (i.e. JIT compiled/optimized to use a single ASM instructions when available), then why can't you simply do something like this:

max(8,Long.numberOfTrailingZeros(val))

where val is your byte value converted to a Long. This is also assuming that max() is available and again optimizes to use asm select or max instructions.

Theoretically, on a machine that supports it, these operations could be JIT compiled to two assembler instructions.

Adisak
+1  A: 

/* Count Leading Zeroes */

static uint8_t clzlut[256] = {
  8,7,6,6,5,5,5,5,
  4,4,4,4,4,4,4,4,
  3,3,3,3,3,3,3,3,
  3,3,3,3,3,3,3,3,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0
};

uint32_t clz(uint32_t val)
{
  uint32_t accum = 0;

  accum += clzlut[val >> 24];
  accum += (accum == 8 ) ? clzlut[(val >> 16) & 0xFF] : 0;
  accum += (accum == 16) ? clzlut[(val >>  8) & 0xFF] : 0;
  accum += (accum == 24) ? clzlut[ val        & 0xFF] : 0;

  return accum;     
}

Explanation:

This works by storing the number of leading zeroes for each permutation of a byte as a lookup table. You use the byte value to look up the count of leading zeroes for that value. Since the example does this for an unsigned int, you shift and mask the four individual bytes, and accumulate the lookups accordingly. The ternary statement is used to stop the accumulation as soon as we find a bit which is set. That the accumulated value is 8, 16 or 24 implies that no set bit is found so far.

Also, some architectures have hardware support for this (as an instruction). The assembly mnemonic is often called 'CLZ' or 'BSR'. They are abbreviations for "Count leading Zeroes" and "Bit Scan Reverse" respectively.

Mads Elvheim