I have written a function which reads an input buffer of bytes and produces an output buffer of words where every word can be either 0x0081 for each ON bit of the input buffer or 0x007F for each OFF bit. The length of the input buffer is given. Both arrays have enough physical place. I also have about 2Kbyte free RAM which I can use for lookup tables or so.
Now, I found that this function is my bottleneck in a real time application. It will be called very frequently. Can you please suggest a way how to optimize this function? I see one possibility could be to use only one buffer and do in-place substitution.
void inline BitsToWords(int8 *pc_BufIn,
int16 *pw_BufOut,
int32 BufInLen)
{
int32 i,j,z=0;
for(i=0; i<BufInLen; i++)
{
for(j=0; j<8; j++, z++)
{
pw_BufOut[z] =
( ((pc_BufIn[i] >> (7-j))&0x01) == 1?
0x0081: 0x007f );
}
}
}
Please do not offer any library-, compiler specific or CPU/Hardware specific optimization, because it is a multi-platform project.