views:

397

answers:

4

I currently have the following code:

float a[4] = { 10, 20, 30, 40 };
float b[4] = { 0.1, 0.1, 0.1, 0.1 };
asm volatile("movups (%0), %%xmm0\n\t"
             "mulps (%1), %%xmm0\n\t"             
             "movups %%xmm0, (%1)"             
             :: "r" (a), "r" (b));

I have first of all a few questions:

(1) if i WERE to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?

see the selected answer for this post: http://stackoverflow.com/questions/841433/gcc-attributealignedx-explanation

(2) Could the code be refactored at all to make it more efficient? What if I put both float arrays in registers rather than just one?

Thanks

+1  A: 

Does GCC provide support for the __m128 data type? If so that's your best plan for guaranteeing a 16 byte aligned data type. Nonetheless there is __attribute__((aligned(16))) for aligning things. Define your arrays as follows

float a[4] __attribute__((aligned(16))) = { 10, 20, 30, 40 };
float b[4] __attribute__((aligned(16))) = { 0.1, 0.1, 0.1, 0.1 };

and then use movaps instead :)

Goz
wow those "__"s really screw up the formatting. Anyone know how to fix that?
Goz
thanks; but as stated in this article http://stackoverflow.com/questions/841433/gcc-attributealignedx-explanation it seems impossible to align arrays that are allocated on the stack? (as opposed to global arrays allocated in .data)
banister
@Goz, yes - use inline code blocks (backticks)
Dominic Rodger
thanks for the fix Bastien :)Banister ... can you give it a try and see what happens? If that linked to explanation is right then it would be impossible to align things like double correctly, yet they DO get aligned.
Goz
yes i will soon...I have a feeling the linked explanation is wrong, as everyone in this question seems to imply. thanks everyone! :)
banister
Thanks Dominic :)
Goz
@Goz, no problem! Bit bemused by @Bastien's edit, but never mind.
Dominic Rodger
+1  A: 

if i WAS to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?

It is required that alignment on the stack works. Otherwise intrinsics would not work. I would guess the post you quoted had to do with the exorbitant value he selected for the alignment value.

to 2:

No, there shouldn't be a difference in performance. See this site for the instruction timings of several processors.


How alignment of stack variables works :

push ebp
mov ebp, esp
and esp, -16    ; fffffff0H
sub esp, 200    ; 000000c8H

The and aligns the begin of the stack to 16 byte.

Christopher
+1  A: 

(1) if i WAS to align the arrays on 16 byte boundaries, would it even work? Since the arrays are allocated on the stack is it true that aligning them is near impossible?

No, it's quite simple to align the stack pointer using and:

and esp, 0xFFFFFFF0 ; aligned on a 16-byte boundary

But you should use what GCC provides, such as a 16 bytes type, or __attribute__ to customize alignment.

Bastien Léonard
thanks for your answer, would you be able to explain to me how you can use 'and' for alignment? i dont quite 'get' it :)
banister
Recall that `some_bit and 0 = 0` and `a/16 = a>>4` if a is unsigned. Using `and` like this will set the four least significant bits to zero, and leave the others unchanged. What happens if you divide `esp` by 16, actually? It gets right-shifted by 4, and the four “lost” bits are the remainder. Thus those four bits should be 0, so that `esp` is divisible by 16. What really happens is that it subtracts *at most* 15, so that `esp` % 16 == 0. (Subtracting from `esp` means allocating more space on the stack).
Bastien Léonard
+5  A: 

Write it in C, use

gcc -S -mssse3

if you have a fairly recent version of gcc.

xcramps
what C code would compile to those sse instructions? do you have an example?
banister
float a[4] = { 10, 20, 30, 40 };float b[4] = { 0.1, 0.1, 0.1, 0.1 };intfoo(void) { int i; for (i=0; i < 4; i++) a[i] *= b[i];}Compile as shown and examine the .s file.
xcramps
interesting, thanks!
banister