views:

38

answers:

1

Hello, I'm writing a semi-accurate garbage collector, and I'm wondering if there is a way to accurately detect the boundaries of the system-allocated stack for a given thread.

I need this so I can scan the stack for roots (pointers), and my current approach is to save the stack pointer when entering a new thread and when entering main(), and then to save it again for each thread when starting a garbage collection, using a global lock.

(I know that this approach is not ideal in the long run, since it causes unnecessary locking, but for now it is acceptable until the basics are up)

But I would really like to have a more "secure" way of detecting the boundaries, because it is quite easy for roots to 'escape' this mechanism (depending on the thread implementation -- pthreads is easy to manage, but not so with OpenMP or Grand Central Dispatch).

It would be even more awesome if there was a portable way to do this, but I don't expect that to be the case. I'm currently working on Mac OS X 10.6, but answers for any platform are welcome.

+1  A: 

You could use the VM mechanism to write-protect the end of the stack, and expand on a VM write trap. Then you'd know the boundary of the stack within a stack page.

But I'm not sure I understand the objection to simply inspecting the current value of the SP for the existing thread, * if you make the "big stack" assumption * typically assumed by the small-number-of-threads parallelism community.

See this SO article for a discussion about a world model where you don't have "big stacks". Then you can't build a GC that simply scans "the stack", because it is heap allocated on demand (e.g., function entry). Now you need to scan all the stack segments that might be live.

Ira Baxter
Thank you for this input. Your suggestion to use virtual memory write traps is clever, but you're probably right that it's unnecessary. Also, it would only work when growing the stack, and not when the required stack space is decreased, which makes it unsuitable for GC purposes (since it will leak a lot of unused roots).What I'm mostly worried about is thread implementations that make different assumptions about the layout and size of the stack, but it may well be that my worries are unfounded.
Simon Ask Ulsnes
The thread implementations don't have much choice, given linear address spaces. Each thread must be started up, with some fixed size maximum linear amount of stack. Both Linux and MS do this, with a (pre)allocation of typically 1Mb (set as a default in the object file), but any thread fork can specify the amount. The case I find interesting is when the preallocation is small, and function calls allocate *new* linear stack frame segments noncontiguously with respect to the caller's stack. Then a simple linear scan of "the" stack can't work, becaure there are lots of pieces.
Ira Baxter