I'm working with GNU assembler on i386, generally under 32-bit Linux (I'm also aiming for a solution under Cygwin).
I have a "stub" function:
.align 4
stub:
call *trampoline
.align 4
stub2:
trampoline:
...
The idea is that the data between stub and stub2 will be copied into allocated memory, along with a function pointer and some context data. When the memory is called, the first instruction in it will push the address of the next instruction and go to trampoline
which will read the address off the stack and figure out the location of the accompanying data.
Now, stub
gets compiled to:
ff 15 44 00 00 00 call *0x44
66 90 xchg %ax,%ax
This is a call to an absolute address, which is good because the address of the call
is unknown. The padding has been turned into what I guess is a do-nothing operation, which is fine and anyway it will never be executed, because trampoline
will rewrite the stack before jumping to the function pointer.
The problem is that the return address pushed by this call will point to the non-aligned xchg
instruction, rather than the aligned data just past it. This means trampoline
needs to correct the alignment to find the data. This isn't a serious problem but it would be slightly preferable to generate something like:
66 90 xchg %ax,%ax
ff 15 44 00 00 00 call *0x44
# Data will be placed starting here
So that the return address points directly at the data. The question is, then: how can I pad the instruction so that the end of it is aligned?
Edit A little background (for those who haven't already guessed). I'm trying to implement closures. In the language,
(int -> int) make_curried_adder(int x)
{
return int lambda (int y) { return x + y; };
}
(int -> int) plus7;
plus7 = make_curried_adder(7);
print("7 + 5 = ", plus7(5));
The { return x + y }
is translated into a normal but anonymous function of two parameters. A block of memory is allocated and populated with the stub instructions, the address of the function, and the value 7. This is returned by make_curried_adder
and when called will push the additional argument 7 on the stack then jump to the anonymous function.
Update
I've accepted Pascal's answer, which is that assemblers tend to be written to run in a single pass. I think some assemblers do have more than one pass to deal with code like "call x; ... ; x: ...", which has a forward reference. (In fact I wrote one a long time ago -- it would go back and fill in the correct address once it had reached x.) Or perhaps all such holes are left for the linker to close. Another problem with end-padding is that you need syntax to say "insert padding here so that there is aligned". I can think of an algorithm that would work for simple cases like that, but it may be such an obscure feature as to not be worth implementing. More complicated cases with nested padding might have contradictory results...