The simplest is probably a lookup table, if you've 4K free space:
static uint32_t t [ 1024 ] = { 0, 0x1, 0x8, ... };
uint32_t m ( int a, int b, int c )
{
return t[a] | ( t[b] << 1 ) | ( t[c] << 2 );
}
The bit hack uses shifts and masks to spread the bits out, so each time it shifts the value and ors it, copying some of the bits into empty spaces, then masking out combinations so only the original bits remain.
for example:
x = 0xabcd;
= 0000_0000_0000_0000_1010_1011_1100_1101
x = (x | (x << S[3])) & B[3];
= ( 0x00abcd00 | 0x0000abcd ) & 0xff00ff
= 0x00ab__cd & 0xff00ff
= 0x00ab00cd
= 0000_0000_1010_1011_0000_0000_1100_1101
x = (x | (x << S[2])) & B[2];
= ( 0x0ab00cd0 | 0x00ab00cd) & 0x0f0f0f0f
= 0x0a_b_c_d & 0x0f0f0f0f
= 0x0a0b0c0d
= 0000_1010_0000_1011_0000_1100_0000_1101
x = (x | (x << S[1])) & B[1];
= ( 0000_1010_0000_1011_0000_1100_0000_1101 |
0010_1000_0010_1100_0011_0000_0011_0100 ) &
0011_0011_0011_0011_0011_0011_0011_0011
= 0010_0010_0010_0011_0011_0000_0011_0001
x = (x | (x << S[0])) & B[0];
= ( 0010_0010_0010_0011_0011_0000_0011_0001 |
0100_0100_0100_0110_0110_0000_0110_0010 ) &
0101_0101_0101_0101_0101_0101_0101_0101
= 0100_0010_0100_0101_0101_0000_0101_0001
In each iteration, each block is split in two, the rightmost bit of the leftmost half of the block moved to its final position, and a mask applied so only the required bits remain.
Once you have spaced the inputs out, shifting them so the values of one fall into the zeros of the other is easy.
To extend that technique for more than two bits between values in the final result, you have to increase the shifts between where the bits end up. It gets a bit trickier, as the starting block size isn't a power of 2, so you could either split it down the middle, or on a power of 2 boundary.
So an evolution like this might work:
0000_0000_0000_0000_0000_0011_1111_1111
0000_0011_0000_0000_0000_0000_1111_1111
0000_0011_0000_0000_1111_0000_0000_1111
0000_0011_0000_1100_0011_0000_1100_0011
0000_1001_0010_0100_1001_0010_0100_1001
// 0000_0000_0000_0000_0000_0011_1111_1111
x = ( x | ( x << 16 ) ) & 0x030000ff;
// 0000_0011_0000_0000_0000_0000_1111_1111
x = ( x | ( x << 8 ) ) & 0x0300f00f;
// 0000_0011_0000_0000_1111_0000_0000_1111
x = ( x | ( x << 4 ) ) & 0x030c30c3;
// 0000_0011_0000_1100_0011_0000_1100_0011
x = ( x | ( x << 2 ) ) & 0x09249249;
// 0000_1001_0010_0100_1001_0010_0100_1001
Perform the same transformation on the inputs, shift one by one and another by two, or them together and you're done.