Here's my solution, but even more importantly my approach to solving the problem.
I approached the problem by
- drawing the memory cells and drawing arrows from the destination to the source.
- made a table showing the above drawing.
- labeling each row in the table with the relative byte address.
This showed me the pattern:
- let iL be the low nybble (half byte) of a[i]
- let iH be the high nybble of a[i]
- iH = (i+1)L
- iL = (i+2)H
This pattern holds for all bytes.
Translating into C, this means:
a[i] = (iH << 4) OR iL
a[i] = ((a[i+1] & 0x0f) << 4) | ((a[i+2] & 0xf0) >> 4)
We now make three more observations:
- since we carry out the assignments left to right, we don't need to store any values in temporary variables.
- we will have a special case for the tail: all 12 bits at the end will be zero.
- we must avoid reading undefined memory past the array. since we never read more than a[i+2], this only affects the last two bytes
So, we
- handle the general case by looping for N-2 bytes and performing the general calculation above
- handle the next to last byte by it by setting iH = (i+1)L
- handle the last byte by setting it to 0
given a with length N, we get:
for (i = 0; i < N - 2; ++i) {
a[i] = ((a[i+1] & 0x0f) << 4) | ((a[i+2] & 0xf0) >> 4);
}
a[N-2] = (a[N-1) & 0x0f) << 4;
a[N-1] = 0;
And there you have it... the array is shifted left by 12 bits. It could easily be generalized to shifting N bits, noting that there will be M assignment statements where M = number of bits modulo 8, I believe.
The loop could be made more efficient on some machines by translating to pointers
for (p = a, p2=a+N-2; p != p2; ++p) {
*p = ((*(p+1) & 0x0f) << 4) | (((*(p+2) & 0xf0) >> 4);
}
and by using the largest integer data type supported by the CPU.
(I've just typed this in, so now would be a good time for somebody to review the code, especially since bit twiddling is notoriously easy to get wrong.)