tags:

views:

104

answers:

3

I'm working with GNU assembler on i386, generally under 32-bit Linux (I'm also aiming for a solution under Cygwin).

I have a "stub" function:

    .align 4
stub:
    call *trampoline
    .align 4
stub2:

trampoline:
    ...

The idea is that the data between stub and stub2 will be copied into allocated memory, along with a function pointer and some context data. When the memory is called, the first instruction in it will push the address of the next instruction and go to trampoline which will read the address off the stack and figure out the location of the accompanying data.

Now, stub gets compiled to:

ff 15 44 00 00 00      call *0x44
66 90                  xchg %ax,%ax

This is a call to an absolute address, which is good because the address of the call is unknown. The padding has been turned into what I guess is a do-nothing operation, which is fine and anyway it will never be executed, because trampoline will rewrite the stack before jumping to the function pointer.

The problem is that the return address pushed by this call will point to the non-aligned xchg instruction, rather than the aligned data just past it. This means trampoline needs to correct the alignment to find the data. This isn't a serious problem but it would be slightly preferable to generate something like:

66 90                  xchg %ax,%ax
ff 15 44 00 00 00      call *0x44
# Data will be placed starting here

So that the return address points directly at the data. The question is, then: how can I pad the instruction so that the end of it is aligned?

Edit A little background (for those who haven't already guessed). I'm trying to implement closures. In the language,

(int -> int) make_curried_adder(int x)
{
    return int lambda (int y) { return x + y; };
}

(int -> int) plus7;
plus7 = make_curried_adder(7);
print("7 + 5 = ", plus7(5));

The { return x + y } is translated into a normal but anonymous function of two parameters. A block of memory is allocated and populated with the stub instructions, the address of the function, and the value 7. This is returned by make_curried_adder and when called will push the additional argument 7 on the stack then jump to the anonymous function.

Update

I've accepted Pascal's answer, which is that assemblers tend to be written to run in a single pass. I think some assemblers do have more than one pass to deal with code like "call x; ... ; x: ...", which has a forward reference. (In fact I wrote one a long time ago -- it would go back and fill in the correct address once it had reached x.) Or perhaps all such holes are left for the linker to close. Another problem with end-padding is that you need syntax to say "insert padding here so that there is aligned". I can think of an algorithm that would work for simple cases like that, but it may be such an obscure feature as to not be worth implementing. More complicated cases with nested padding might have contradictory results...

+1  A: 

Is there a problem with adding your own xchg instruction prior to the call? Since you have an align just prior to stub, the alignment should be consistent.

Mark Ransom
Only insofar as a) I'd like to get the assembler to do that, because b) there's a layer of translation between the assembly code I write and the instructions that get generated, and I don't want to make too many assumptions along the lines of "call * always generates exactly 6 bytes".
Edmund
"call * always generates exactly 6 bytes" - that level of control is why you go to assembly in the first place. IMO I don't think there's any assembler directives to align the end of an instruction rather than the beginning.
Mark Ransom
+1  A: 

Unfortunately, most assemblers are one-pass simple translators, which limit the flexibility of alignment directives they can offer. Even among all the alignment options that assemblers working in several passes could offer, many are neglected because there are too specific. Yours is one of those, I am afraid. It could work in a one-pass assembler as long as it's only one instruction you intend to move, but it's very specific.

I have seen the manual of a sophisticated multi-pass assembler that let you substract the addresses of two labels to get the length of a sequence of instruction, and would let you insert a directive to insert a sequence of NOPs, say, (4 - this length modulo 4) in the place of your choice (as long as it remained possible to converge on a definite position for each instruction). I can't remember what assembler it was. Definitely not gas, which is one-pass as far as I know. It may have been the venerable A386.

Pascal Cuoq
I suspected as much -- I guess the assembler doesn't know how much padding to put before instruction #1 until its figured out where to put instruction #n. It's understandable and isn't I can manage without it. (The manual adjustment turns out to be as simple as "addl $3, %eax; andl $0xFFFFFFFC, %eax"...)
Edmund
+1  A: 

Have you considered putting the data before the code?

This way it is only a substraction (of the length of the stub code plus some constant offset) to get to the address of the data, so it's one instruction instead of two as your were ready to accept. And I believe that gas will give you the length of the stub code (as the difference of two labels) without problem since the labels are used after having been defined in this case.

Assuming the data is made of 32-bit words, there is also less padding involved compared to your initial solution (although I am not sure why there so many .align directives in your initial solution, probably some orthogonal constraint that you didn't get into).

Pascal Cuoq
That means the pointer to this block of (data, code) won't be directly callable, which is a pity because it's nice to be able to pass it into C code as a callback. I could return block+sizeof(data), but then I suspect my GC (as yet unwritten) will not recognise that pointer as being dependent on the allocated block.
Edmund
Infix pointers are difficult to manage in general, but if you are going to write the GC yourself, this particular case of infix pointer can be taken into account. Whereas you typically have (header, data) blocks with the GC accessing the header with `pointer[-1]`, you would have here (data, header, code) with the pointer to code, the pointer to code, and the header indicating how far the block reaches in both directions. You could pass it C code because it would be recognizable that static C code is not in the heap, so the GC would not look for a header in this case.
Pascal Cuoq