I have the following bottleneck function.
typedef unsigned char byte;
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
for (const byte * p1 = p1Start; p1 != p1End; ++p1, ++p2, ++p3) {
*p3 = (*p1 < *p2 ) ? b1 : b2;
}
}
I want to replace C++
code with SSE2 intinsic functions. I have tried _mm_cmpgt_epi8
but it used signed compare. I need unsigned compare.
Is there any trick (SSE, SSE2, SSSE3) to solve my problem?
Note: I do not want to use multi-threading in this case.