I'm very new to SSE and have optimized a section of code using intrinsics. I'm pleased with the operation itself, but I'm looking for a better way to write the result. The results end up in three _m128i
variables.
What I'm trying to do is store specific bytes from the result values to non-contiguous memory locations. I'm currently doing this:
__m128i values0,values1,values2;
/*Do stuff and store the results in values0, values1, and values2*/
y[0] = (BYTE)_mm_extract_epi16(values0,0);
cb[2]=cb[3] = (BYTE)_mm_extract_epi16(values0,2);
y[3] = (BYTE)_mm_extract_epi16(values0,4);
cr[4]=cr[5] = (BYTE)_mm_extract_epi16(values0,6);
cb[0]=cb[1] = (BYTE)_mm_extract_epi16(values1,0);
y[1] = (BYTE)_mm_extract_epi16(values1,2);
cr[2]=cr[3] = (BYTE)_mm_extract_epi16(values1,4);
y[4] = (BYTE)_mm_extract_epi16(values1,6);
cr[0]=cr[1] = (BYTE)_mm_extract_epi16(values2,0);
y[2] = (BYTE)_mm_extract_epi16(values2,2);
cb[4]=cb[5] = (BYTE)_mm_extract_epi16(values2,4);
y[5] = (BYTE)_mm_extract_epi16(values2,6);
Where y
, cb
, and cr
are byte (unsigned char
) arrays. This seems wrong to me for reasons I can't define. Does anyone have any suggestions for a better way?
Thanks!