I have a 64 bit number (but only the 42 low order bits are used) and need to computer the sum of the 4 bits at n
, n+m
, n+m*2
and n+m*3
(note: anything that can produce a sum >4 is invalid) for some fixed m and every value of n that places all the bits in the number
as an example using m=3
and given the 16-bit number
0010 1011 0110 0001
I need to compute
2, 3, 1, 2, 3, 0, 3
Does anyone have any (cool) ideas for ways to do this? I'm fine with bit twiddling.
My current thought is to make bit shifted copies of the input to align the values to be summed and then build a logic tree to do a 4x 1bit adder.
v1 = In;
v2 = In<<3;
v3 = In<<6;
v4 = In<<9;
a1 = v1 ^ v2;
a2 = v1 & v2;
b1 = v3 ^ v4;
b2 = v3 & v4;
c2 = a1 & b1;
d2 = a2 ^ b2;
o1 = a1 ^ b1;
o2 = c2 ^ d2;
o4 = a2 & b2;
This does end up with the bits of the result spread across 3 different ints but oh well.
edit: as it happens I need the histogram of the sums so doing a bit-count of o4
, o2&o1
, o2
and o1
gives me what I want.
a second solution uses a perfect hash function
arr = [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4];
for(int i = 0; i < N; i++)
{
out[i] = arr[(In & 0b1001001001) % 30];
In >>= 1;
}
This works by noting that the 4 selected bits can only take on 16 patterns and that (by guess and check) they can be hashed into 0-15 using mod 30. From there, a table of computed values gives the needed sum. As it happens only 3 of the 4 strides I need work this way.
p.s.
Correct trumps fast. Fast trumps clear. I expect to be running this millions of time.