Jaap Weidemann
2010-06-02 18:40:57
Works like a charm. Thanks!
BitTwiddler1011
2010-06-02 20:14:13
+1
A:
If you're on a 32-bit machine, split w
into two 32-bit words, calculate the popcount separately for each half, then add up. This will get rid of some unneeded operations that are required to synthesize 64-bit operations from 32-bit ones (shifts, mults...). This also allows for increased parallelism if you interleave the calculations.
If you're compiling 64-bit code, you may try this:
int popcnt64(uint64_t w)
{
uint64_t w1 = (w & 0x2222222222222222) + ((w+w) & 0x2222222222222222);
uint64_t w2 = (w >> 1 & 0x2222222222222222) + (w >> 2 & 0x2222222222222222);
w1 = w1 + (w1 >> 4) & 0x0f0f0f0f0f0f0f0f;
w2 = w2 + (w2 >> 4) & 0x0f0f0f0f0f0f0f0f;
return (w1 + w2) * 0x0101010101010101 >> 57;
}
This contains more operations, but gives more opportunities of parallel execution to the CPU. On newer CPUs it should be slightly faster, on others it will be slightly slower.
slacker
2010-06-02 18:55:04