ansaurus

Question

How can I create a parallel stack and run a coroutine on it?

Answer 1

A:

Good learning reference: libcoroutine, especially their setjmp/longjmp implementation. I know its not fun to use an existing library, but you can at least get a general bearing on where you are going.

Yann Ramin 2010-06-22 02:30:37

Thank you very much! I'll investigate it (and especially `ucontext`, as it seems just on spot for the problem it solves without doing all the work).

zneak 2010-06-22 02:36:10

Answer 2

A:

Simon Tatham has an interesting implementation of coroutines in C that doesn't require any architecture-specific knowledge or stack fiddling. It's not exactly what you're after, but I thought it might nonetheless be of at least academic interest.

caf 2010-06-22 04:57:40

Answer 3

+4 A:

You are correct in that PUSHA wont work on x64 it will raise the exception #UD, as PUSHA only pushes the 16-bit or 32-bit general purpose registers. See the Intel manuals for all the info you ever wanted to know.

Setting RIP is simple, jmp rax will set RIP to RAX. To retrieve RIP, you could either get it at compile time if you already know all the coroutine exit origins, or you could get it at run time, you can make a call to the next address after that call. Like this:

a:
call b
b:
pop rax

RAX will now be b. This works because CALL pushes the address of the next instruction. This technique works on IA32 as well (although I'd suppose there's a nicer way to do it on x64, as it supports RIP-relative addressing, but I don't know of one). Of course if you make a function coroutine_yield, it can just intercept the caller address :)

Since you can't push all the registers to the stack in a single instruction, I wouldn't recommend storing the coroutine state on the stack, as that complicates things anyways. I think the nicest thing to do would be to allocate a data structure for every coroutine instance.

Why are you zeroing things in function A? That's probably not necessary.

Here's how I would approach the entire thing, trying to make it as simple as possible:

Create a structure coroutine_state that holds the following:

initarg
arg
registers (also contains the flags)
caller_registers

Create a function:

coroutine_state* coroutine_init(void (*coro_func)(coroutine_state*), void* initarg);

where coro_func is a pointer to the coroutine function body.

This function does the following:

allocate a coroutine_state structure cs
assign initarg to cs.initarg, these will be the initial argument to the coroutine
assign coro_func to cs.registers.rip
copy current flags to cs.registers (not registers, only flags, as we need some sane flags to prevent an apocalypse)
allocate some decent sized area for the coroutine's stack and assign that to cs.registers.rsp
return the pointer to the allocated coroutine_state structure

Now we have another function:

void* coroutine_next(coroutine_state cs, void* arg)

where cs is the structure returned from coroutine_init which represents a coroutine instance, and arg will be fed into the coroutine as it resumes execution.

This function is called by the coroutine invoker to pass in some new argument to the coroutine and resume it, the return value of this function is an arbitrary data structure returned (yielded) by the coroutine.

store all current flags/registers in cs.caller_registers except for RSP, see step 3.
store the arg in cs.arg
fix the invoker stack pointer (cs.caller_registers.rsp), adding 2*sizeof(void*) will fix it if you're lucky, you'd have to look this up to confirm it, you probably want this function to be stdcall so no registers are tampered with before calling it
mov rax, [rsp], assign RAX to cs.caller_registers.rip; explanation: unless your compiler is on crack, [RSP] will hold the instruction pointer to the instruction that follows the call instruction that called this function (ie: the return address)
load the flags and registers from cs.registers
jmp cs.registers.rip, efectively resuming execution of the coroutine

Note that we never return from this function, the coroutine we jump to "returns" for us (see coroutine_yield). Also note that inside this function you may run into many complications such as function prologue and epilogue generated by the C compiler, and perhaps register arguments, you have to take care of all this. Like I said, stdcall will save you lots of trouble, I think gcc's -fomit-frame_pointer will remove the epilogue stuff.

The last function is declared as:

void coroutine_yield(void* ret);

This function is called inside the coroutine to "pause" execution of the coroutine and return to the caller of coroutine_next.

store flags/registers in cs.registers
fix coroutine stack pointer (cs.registers.rsp), once again, add 2*sizeof(void*) to it, and you want this function to be stdcall as well
mov rax, arg (lets just pretend all the functions in your compiler return their arguments in RAX)
load flags/registers from cs.caller_registers
jmp cs.caller_registers.rip This essentially returns from the coroutine_next call on the coroutine invoker's stack frame, and since the return value is passed in RAX, we returned arg. Let's just say if arg is NULL, then the coroutine has terminated, otherwise it's an arbitrary data structure.

So to recap, you initialize a coroutine using coroutine_init, then you can repeatedly invoke the instantiated coroutine with coroutine_next.

The coroutine's function itself is declared: void my_coro(coroutine_state cs)

cs.initarg holds the initial function argument (think constructor). Each time my_coro is called, cs.arg has a different argument that was specified by coroutine_next. This is how the coroutine invoker communicates with the coroutine. Finally, every time the coroutine wants to pause itself, it calls coroutine_yield, and passes one argument to it, which is the return value to the coroutine invoker.

Okay, you may now think "thats easy!", but I left out all the complications of loading the registers and flags in the correct order while still maintaining a non corrupt stack frame and somehow keeping the address of your coroutine data structure (you just overwrote all your registers), in a thread-safe manner. For that part you will need to find out how your compiler works internally... good luck :)

Longpoke 2010-06-22 05:38:20

Woah, so many things to say and so few characters allowed in each comment! First things first.Yeah, zeroing registers is probably not useful.Poking around, I found Apple's implementation of getcontext/setcontext from the XNU project:http://www.opensource.apple.com/source/Libc/Libc-594.1.4/x86_64/gen/They grab `rip` by dereferencing `rsp` (without touching the stack for the whole function), which makes sense because we want returning calls to start where `getcontext` will leave, not somewhere in `getcontext`. They also leave alone non-GPR registers.

zneak 2010-06-22 06:37:33

Why do we need to copy the flags? Not copying them will just leave them untouched, no?For the arguments being passed by register, on Mac OS x86 there isn't much to do about it. __Eight__ registers are dedicatedto arguments passing. On the hand, this can be an advantage because it means I have to save less stuff (they're caller-save).

zneak 2010-06-22 06:37:49

@zneak: well, say an executing coroutine function sets the direction flag a certain way, then yields. upon resuming, I'd guess you want to restore that flag. (then again, nobody usually saves flags when calling normal functions...) I'm not sure, there is a define for RFLAGS in the link you pasted, but they never touch it, it all depends on how your compiler and runtime is setup.

Longpoke 2010-06-22 13:53:49

Actually, they save it when building in DEBUG and leave it alone in RELEASE. I don't think they ever restore it though (maybe that's really for debugging purposes, as in "checking what was in rflags" at a certain moment through getcontext).

zneak 2010-06-22 15:30:52

ansaurus

tags:

views:

answers:

How can I create a parallel stack and run a coroutine on it?

related questions