ansaurus

Question

Why does my data not seem to be aligned?

Answer 1

+8 A:

m_sincos[t] is a C expression. In an assembly instruction, however, (__asm?), it's interpreted as an x86 addressing mode, with a completely different result. For example, VS2008 SP1 compiles:

movaps xmm0, m_sincos[t]

into: (see the disassembly window when the app crashes in debug mode)

movaps xmm0, xmmword ptr [t]

That interpretation attempts to copy a 128-bit value stored at the address of the variable t into xmm0. t, however, is a 32-bit value at a likely unaligned address. Executing the instruction is likely to cause an alignment failure, and would get you incorrect results at the odd case where t's address is aligned.

You could fix this by using an appropriate x86 addressing mode. Here's the slow but clear version:

__asm mov eax, m_sincos                  ; eax <- m_sincos
__asm mov ebx, dword ptr t
__asm shl ebx, 4                         ; ebx <- t * 16 ; each array element is 16-bytes (128 bit) long
__asm movaps xmm0, xmmword ptr [eax+ebx] ; xmm0 <- m_sincos[t]

Sidenote:

When I put this in a complete program, something odd occurs:

#include <math.h>
#include <tchar.h>
#include <xmmintrin.h>

int main()
{
    static __m128 *m_sincos;
    int Bins = 4;

    m_sincos = (__m128*) _aligned_malloc(Bins*sizeof(__m128), 16);
    for (int t=0; t<Bins; t++) {
        m_sincos[t] = _mm_set_ps(cos((float) t), sin((float) t), sin((float) t), cos((float) t));
        __asm movaps xmm0, m_sincos[t];
        __asm mov eax, m_sincos
        __asm mov ebx, t
        __asm shl ebx, 4
        __asm movaps xmm0, [eax+ebx];
    }

    return 0;
}

When you run this, if you keep an eye on the registers window, you might notice something odd. Although the results are correct, xmm0 is getting the correct value before the movaps instruction is executed. How does that happen?

A look at the generated assembly code shows that _mm_set_ps() loads the sin/cos results into xmm0, then saves it to the memory address of m_sincos[t]. But the value remains there in xmm0 too. _mm_set_ps is an 'intrinsic', not a function call; it does not attempt to restore the values of registers it uses after it's done.

If there's a lesson to take from this, it might be that when using the SSE intrinsic functions, use them throughout, so the compiler can optimize things for you. Otherwise, if you're using inline assembly, use that throughout too.

Oren Trutner 2010-06-04 15:58:29

@Oren Trutner - Wow, that's probably the best answer that I've read in all of my searching, thanks for the clear explanation! So, if I wanted to use assembly throughout, does that mean that I would have to do the shl instruction to move to the correct position in my array just as you do with the intrinsics? Thanks very much!!

Brett 2010-06-04 16:07:37

Yes, you need to multiply the array index by 16 to get the correct offset. x86 has a number of addressing modes that multiply indices for you, avoiding the need to shift explicitly. I could not, however, find one that would multiply by 16. Doesn't mean there isn't one, just that I didn't find it. An alternative would be to increment the index by 16 on each iteration.

Oren Trutner 2010-06-04 16:13:09

learned something new today. thank you

aaa 2010-06-04 16:13:19

Answer 2

+1 A:

You should always use the instrinsics or even just turn it on and leave them, rather than explicitly coding it in. This is because __asm is not portable to 64bit code.

DeadMG 2010-06-04 16:55:07

Thanks for the suggestion, I was just reading into that when you posted!

Brett 2010-06-04 17:55:39

ansaurus

tags:

views:

answers:

Why does my data not seem to be aligned?

related questions