views:

188

answers:

5

Hello. I use inline assembly massively in a project where I need to call functions with an unknown number of arguments at compile time and while I manage myself to get it to work, sometimes, in linux (in windows I don't recall having that problem) strange things like this happen:

If I have something like

for(int i = 1; i >= 0; i--)
   asm("push %0"::"m"(someArray[i]));

It works.

If I have

for(int i = this->someVar; i >= 0; i--)
   asm("push %0"::"m"(someArray[i]));

and I guarantee with my life that someVar is holding the value 1 it throws segmentation fault.

Also if I have

int x = 1;
for(int i = x; i >= 0; i--)
   asm("push %0"::"m"(someArray[i]));

it works but

int x = this->someVar;
for(int i = x; i >= 0; i--)
    asm("push %0"::"m"(someArray[i]));

does not.

Also, and also strangely, I can say that while in some functions I don't have problems doing that in others I have, all in the same object.

If someone can point me to some information that can clear up what's the problem there, I would appreciate.

Beware that I really have to push the arguments in a for loop so avoiding it is not an option.

I also tried using the inline assembly word "volatile" but nothing changed.

+4  A: 

I can't understand what's the problem but try to write code using clear asm code same as

asm{
   loop1:
     mov ax, this->var
     ...
     dec ax
     cmp ax, 0
     je exit
     jmp loop1
}

...

exit:

Also try to make "var" value as static may it help too.

oivoodoo
It can't be static because of how the code flows. I'm going to give a try at making it assembly only code. Thanks.
+4  A: 

Examine the disassembly. The most likely cause is that i and/or the variables holding the end value are being refetched from a fixed offset on the stack at each iteration of the for loop, and your push offsets the stack pointer from where the compiler expected it to be and so causes the wrong values to be fetched.

You could attempt various workarounds (e.g. declaring the local variables register), but unfortunately there is no good way to guarantee correct behaviour in C/C++ in this case. To avoid the problem, implement the loop yourself, as oivoodoo suggests.

moonshadow
Thanks i'm going to give at a try at oivoodoo suggestion
+3  A: 

Here's my psychic debugging effort:

i and this are most likely stored on the stack, and on the 386 and up, machine code can refer to esp-relative memory locations directly, so the compiler may well produce instructions like

mov eax,[esp+8]

to get the value of this into the eax register. The problem is that your push operations mess with the stack pointer, so these hard coded accesses will access (increasingly) wrong memory locations after the first iteration.

Most likely, the simpler loop forms without this->someVar are optimised more thoroughly by the compiler and result in machine code that uses only registers and no esp-relative accesses, meaning they continue to work fine.

Once upon a time, all memory accesses to local variables and arguments were done via the ebp register, which is not changed by your inline assembly code. If you can find a compiler switch to force the use of ebp instead of esp, this may solve your problem.

Warning: the compiler does not expect you to mess with the stack -- it expects that it knows at all times where the top-of-stack is. If you really want to dynamically push things on the stack, I would suggest writing the loop itself completely in assembly language as oivoodoo has done.

j_random_hacker
While I don't have your knowledge on the subject my observation and what I know lead me to think that what is happening is something like what you saying. Thanks for the useful comment.
A: 

If you know the limit to the number of args, you can just call it with a single function call with that number of arguments, aligning your actual arguments up to the end.

Warning: the x86_64 abi uses register for some parameters, which breaks this and your code too.

Alex Brown
The number is unlimited and I want it to be like that. I imagine that your solution would be less efficient. About the x86_64 question you refer, it is not a problem just because the application isn't applicable on it.
A: 

First, what is probably happening is that gcc under linux is using the stack pointer to index your local variables rather than using the stack frame pointer. This is an optimization that allows gcc to use the frame pointer (BP under x86) as another general purpose register and avoid lots of code that sets up frames. Frames are essentially just the area between SP and BP that belong to the local function. I'll bet that if you included a call to alloca with a size that you passed into this function it would all get better because it would force the compiler to not do this optimization.

That being said, the bug is really in your code. Unless you really know what you're doing you should never exit an inline asm with the stack pointer different than what it was when you entered the inline asm. Compilers almost always think that they exclusively own the stack pointer. They depend on it staying the same so that they can use it to find where they have stored variables. You should also stay away from the frame pointer (BP).

The times when it is ok to mess with those are rare and usually for things like context switching code (changing from one thread or process to another).

nategoose