tags:

views:

27

answers:

1

I am trying to optimize a function using SSE2. I'm wondering if I can prepare the data for my assembly code better than this way. My source data is a bunch of unsigned chars from pSrcData. I copy it to this array of floats, as my calculation needs to happen in float.


unsigned char *pSrcData = GetSourceDataPointer();

__declspec(align(16)) float vVectX[4];

vVectX[0] = (float)pSrcData[0];
vVectX[1] = (float)pSrcData[2];
vVectX[2] = (float)pSrcData[4];
vVectX[3] = (float)pSrcData[6];

__asm 
{
     movaps xmm0, [vVectX]
     [...]  // do some floating point calculations on float vectors using addps, mulps, etc
}

Is there a quicker way for me to cast every other byte of pSrcData to a float and store it into vVectX?

Thanks!

+2  A: 

(1) AND with a mask to zero out the odd bytes (PAND)

(2) Unpack from 16 bits to 32 bits (PUNPCKLWD with a zero vector)

(3) Convert 32 bit ints to floats (CVTDQ2PS)

Three instructions.

Paul R