views:

302

answers:

4

I try to build an application which uses pthreads and __m128 SSE type. According to GCC manual, default stack alignment is 16 bytes. In order to use __m128, the requirement is the 16-byte alignment.

My target CPU supports SSE. I use a GCC compiler which doesn't support runtime stack realignment (e.g. -mstackrealign). I cannot use any other GCC compiler version.

My test application looks like:

#include <xmmintrin.h>
#include <pthread.h>
void *f(void *x){
   __m128 y;
   ...
}
int main(void){
  pthread_t p;
  pthread_create(&p, NULL, f, NULL);
}

The application generates an exception and exits. After a simple debugging (printf "%p", &y), I found that the variable y is not 16-byte aligned.

My question is: how can I realign the stack properly (16-byte) without using any GCC flags and attributes (they don't help)? Should I use GCC inline Assembler within this thread function f()?

+2  A: 

This shouldn't be happening in the first place, but to work around the problem you can try:

void *f(void *x)
{
   __m128 y __attribute__ ((aligned (16)));
   ...
}
Paul R
No, it doesn't help. The same problem.
psihodelia
My guess is you're doing this on Windows rather than a proper operating system ? There is some good info here on working around this problem: http://www.sourceware.org/ml/pthreads-win32/2008/msg00056.html
Paul R
No, I work on Linux
psihodelia
It looks like this is a bug in old versions of gcc - it seems to have been fixed around 2004 - is there some reason why you can't use a more up-to-date toolchain ?
Paul R
Actually no, I cannot use another GCC version - we have a specific hardware/software environment.
psihodelia
I am trying to implement explicit stack adjustment using inline assembler.
psihodelia
+4  A: 

Allocate on the stack an array that is 15-bytes larger than sizeof(__m128), and use the first aligned address in that array. If you need several, allocate them in an array with a single 15-byte margin for alignment.

I do not remember if allocating an unsigned char array makes you safe from strict aliasing optimizations by the compiler or if it only works only the other way round.

#include <stdint.h>

void *f(void *x)
{
   unsigned char y[sizeof(__m128)+15];
   __m128 *py = (__m128*) (((uintptr_t)&y) + 15) & ~15);
   ...
}
Pascal Cuoq
You also might want to examine whether the overall thread stack is being allocated with a 16-byte alignment.
Donal Fellows
psihodelia
Unfortunately this forces the variable to be on the stack regardless of potential compiler optimisations (like keeping it in a register).
Paul R
What is ptr_t ?
psihodelia
I'm guess it's meant to be `uintptr_t`, but either way it's just an integer type that's big enough to hold a pointer.
Paul R
Pascal Cuoq
It doesn't work for me, because I have a lot of nested functions and local variables.
psihodelia
A: 

I have solved this problem. Here is my solution:

void another_function(){
   __m128 y;
   ...
}
void *f(void *x){
asm("pushl    %esp");
asm("subl    $16,%esp");
asm("andl    $-0x10,%esp");
another_function();
asm("popl %esp");
}

First, we increase the stack by 16 bytes. Second, we make least-significant nibble equal 0x0. We preserve the stack pointer using push/pop operands. We call another function, which has all its own local variables 16-byte aligned. All nested functions will also have their local variables 16-byte aligned.

And It works!

psihodelia
Seriously. UPDATE YOUR COMPILER. Don't be proud of yourself for putting rube goldberg devices in your code.
Frank Krueger
This code appears to save ESP on the stack, then move ESP somewhere else, then pop ESP. This will cause a random value to be popped into ESP. Doesn't this cause a crash? Or are you using a calling convention where ESP is saved somewhere else, perhaps into EBP, and restored at the end, making that POP superfluous?
user9876
1) I cannot update GCC -> I have a specific run-time environment and a specific x86-compatible CPU.2) No, why can it cause a crash? Saving ESP, then restoring it does not cause any crash or a random value. I have tested the code above also without pushl/popl and it is also Ok. No any calling convention and ESP is not saved somewhere else.
psihodelia
A: 

Another solution would be, to use a padding function, which first aligns the stack and then calls f. So instead of calling f directly, you call pad, which pads the stack first and then calls foowith an aligned stack.

The code would look like this:

#include <xmmintrin.h>
#include <pthread.h>

#define ALIGNMENT 16

void *f(void *x) {
    __m128 y;
    // other stuff
}

void * pad(void *val) {
    unsigned int x; // to get the current address from the stack
    unsigned char pad[ALIGNMENT - ((unsigned int) &x) % ALIGNMENT];
    return f(val);
}

int main(void){
    pthread_t p;
    pthread_create(&p, NULL, pad, NULL);
}
ablaeul