Most of the GCs are so called "stop the world" GCs - at some predefined points ("Gc points" - eg. call points, jumps, returns) each thread will check if there exists some other thread that wants to do a GC cycle. Once all of the threads are stopped GC cycle will run.
Of course, there are other possibilities - realtime, incremental, concurrent (and much more) GCs also exist - search the web (you'll mostly find published papers) or simply buy a book about GCs
As for stack scanning there are a few ways:
precise scanning:
tagged stack - you basically keep 2 stacks around - one with values and one one with "tags". What tags are depends on what you need, but it will mostly be a "is reference"/"is not reference" mark
stacks with no tags - basically if you have a language with strict types for which you can, at every point in time (but more commonly in every "GC point") know what the typeson the stack are. I'll give you a example used with a simple interpreter (I just made it up):
no-return function XY (int):
load_param 1
ipush 1
iadd
call Z (assume: int function Z (int, int))
new some_object
call Y
If we define our GC poins to be call/new then there are possibly 3 points at which we might need to know our stack types (at the XY function entry the stack is considered "empty"):
- call Z - load_param load a int param, ipush pushes an int - 2 ints on the stack, nothing for our GC to do here
- new - call Z used 2 integers from the stack and places an int
- call Y - new placed a reference, so now we have a int and a reference, GC has to know about that reference
Note that I said the stack is "empty" on function entry - well it really isn't but you can analyze every function individually, and then just move up the "call stack" (you have a return pointer somewhere so you know where you have to return - return-1 is some call for which you can also get an image of the stack. Repeat until you reach the top).
There are two ways to remember this information (for a non-tagged stack):
- generate a table for each GC point
- generate a code fragment for each GC point (the code fragment itself will mark the referenced objects)
As for when you do collect this information, it could be precompiled or just in time.
This can also be applied to machine stacks, but things get a bit complicated since you might have to track the registers as well.
You should be able to find online a few good papers about tagless stacks as well.
There are further modifications where you add the eg. "liveness" information to your data (ok, there is a reference on the stack, but if there is no code down the instruction stream that uses it I can treat it as unreachable now)
- conservative collection (as opposed to precise scanning) - where you ask yourself "is this value on the stack, interpreted as a pointer, a reference", if it is then it is "alive". This can of course leak if there is eg. an integer that looks as a pointer (the memory will be freed if the integer ever changes, but could also leek forever). Most c/c++ collectors are implemented this way so search in that direction if you're curious.
Does each process require it's own
garbage collection routine?
There is nothing that would require this but I'd say it is common - for simplicity I'd have a different GC instance (but only one for all threads) for each of my processes. I imagine there could be a shared memory allocator between the processes - the only upside I see is maybe a possibility to better manage memory fragmentation (because you control more memory) but the complexity (inteproces communication/synchronization - yuck), amount of shared data and lack of independence become problems. I am guessing here and I have no real experience with this kinds of GCs (or even if they exist) - it just seems like common sense to me.
Is it a
good idea to run the garbage collector
in a separate thread/process from the
actual program?
Well, it depends. It should be a good idea to keep it in another thread, but what you get by this depends - do you want to keep you GC simple ("stop the world" - all other threads are paused so it pretty much doesn't matter in which thread we execute our GC, but its nicer if it has it's own thread), or do you have special requirements (eq. your threads are real time and should not be stopped for a long while, then you'll have a different GC thread and use a realtime/incremental GC algorithm).
It all depends on what you need, but whatever you do, remember - keep it as simple as possible.
And I almost forgot, there are some great alternatives to writing your own GC from scratch - eg. see LLVM, they claim "leveraging these tools, it is straightforward to emit type-accurate stack maps for your runtime in as little as ~100 lines of C++ code." (precompiled code only, no code JIT generation support as of jet).
Also, you might take a look at some of the code for some java VMs (eg. phoneME Advanced VM (CVM), and kaffe are pretty readable as far as I remember).
Disclaimer: I once implemented (as a student project) a precise, stop the world, tag-less, mark&sweep GC - everything written here is a result of what I can remember from the experience and should be correct, but it may not be "best practice". Corrections are welcome.