views:

79

answers:

2

I was reading the paper "Garbage Collector in an Uncooperative Environment" and wondering how hard it would be to implement it. The paper describes a need to collect all addresses from the processor (in addition to the stack). The stack part seems intuitive. Is there any way to collect addresses from the registers other than enumerating each register explicitly in assembly? Let's assume x86_64 on a POSIX-like system such as linux or mac.

SetJmp

A: 

Since Boehm and Weiser actually implemented their GC, then a basic source of information is the source code of that implementation (it is opensource).

To collect the register values, you may want to subvert the setjmp() function, which saves a copy of the registers in a custom structure (at least those registers which are supposed to be preserved across function calls). But that structure is not standardized (its contents are nominally opaque) and setjmp() may be specially handled by the C compiler, making it a bit delicate to use for anything other than a longjmp() (which is already quite hard as it is). A piece of inline assembly seems much easier and safer.

The first hard part in the GC implementation seems to be able to reliably detect the start and end of stacks (note the plural: there may be threads, each with its own stack). This requires delving into ill-documented details of OS ABI. When my desktop system was an Alpha machine running FreeBSD, the Boehm-Weiser implementation could not run on it (although it supported Linux on the same processor).

The second hard part will be when trying to go generational, trapping write accesses by playing with page access rights. This again will require reading some documentation of questionable existence, and some inline assembly.

Thomas Pornin
A: 

I think on x86_86 they use the flushrs assembly instruction to put the registers on the stack. I am sure someone on stack overflow will correct me if this is wrong.

SetJmp