I currently have the following code:
float a[4] = { 10, 20, 30, 40 };
float b[4] = { 0.1, 0.1, 0.1, 0.1 };
asm volatile("movups (%0), %%xmm0\n\t"
"mulps (%1), %%xmm0\n\t"
"movups %%xmm0, (%1)"
:: "r" (a), "r" (b));
I have first of all a few questions:
(1) if i WERE to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?
see the selected answer for this post: http://stackoverflow.com/questions/841433/gcc-attributealignedx-explanation
(2) Could the code be refactored at all to make it more efficient? What if I put both float arrays in registers rather than just one?
Thanks