views:

856

answers:

5

How to do garbage collection in a program that consist from multiple threads or processes?

I also like to know how can I scan the stack from each of those threads and processes?

Does each process require it's own garbage collection routine? Is it a good idea to run the garbage collector in a separate thread/process from the actual program?

+1  A: 

Most every GC environment, for instance Java and .NET, halt all threads during garbage collection.

+6  A: 

Most of the GCs are so called "stop the world" GCs - at some predefined points ("Gc points" - eg. call points, jumps, returns) each thread will check if there exists some other thread that wants to do a GC cycle. Once all of the threads are stopped GC cycle will run.

Of course, there are other possibilities - realtime, incremental, concurrent (and much more) GCs also exist - search the web (you'll mostly find published papers) or simply buy a book about GCs

As for stack scanning there are a few ways:

  • precise scanning:

    • tagged stack - you basically keep 2 stacks around - one with values and one one with "tags". What tags are depends on what you need, but it will mostly be a "is reference"/"is not reference" mark

    • stacks with no tags - basically if you have a language with strict types for which you can, at every point in time (but more commonly in every "GC point") know what the typeson the stack are. I'll give you a example used with a simple interpreter (I just made it up):

no-return function XY (int):
 load_param 1 
 ipush 1
 iadd
 call Z (assume: int function Z (int, int))
 new some_object
 call Y

If we define our GC poins to be call/new then there are possibly 3 points at which we might need to know our stack types (at the XY function entry the stack is considered "empty"):

  • call Z - load_param load a int param, ipush pushes an int - 2 ints on the stack, nothing for our GC to do here
  • new - call Z used 2 integers from the stack and places an int
  • call Y - new placed a reference, so now we have a int and a reference, GC has to know about that reference

Note that I said the stack is "empty" on function entry - well it really isn't but you can analyze every function individually, and then just move up the "call stack" (you have a return pointer somewhere so you know where you have to return - return-1 is some call for which you can also get an image of the stack. Repeat until you reach the top).

There are two ways to remember this information (for a non-tagged stack):

  • generate a table for each GC point
  • generate a code fragment for each GC point (the code fragment itself will mark the referenced objects)

As for when you do collect this information, it could be precompiled or just in time.

This can also be applied to machine stacks, but things get a bit complicated since you might have to track the registers as well.

You should be able to find online a few good papers about tagless stacks as well.

There are further modifications where you add the eg. "liveness" information to your data (ok, there is a reference on the stack, but if there is no code down the instruction stream that uses it I can treat it as unreachable now)

  • conservative collection (as opposed to precise scanning) - where you ask yourself "is this value on the stack, interpreted as a pointer, a reference", if it is then it is "alive". This can of course leak if there is eg. an integer that looks as a pointer (the memory will be freed if the integer ever changes, but could also leek forever). Most c/c++ collectors are implemented this way so search in that direction if you're curious.

Does each process require it's own garbage collection routine?

There is nothing that would require this but I'd say it is common - for simplicity I'd have a different GC instance (but only one for all threads) for each of my processes. I imagine there could be a shared memory allocator between the processes - the only upside I see is maybe a possibility to better manage memory fragmentation (because you control more memory) but the complexity (inteproces communication/synchronization - yuck), amount of shared data and lack of independence become problems. I am guessing here and I have no real experience with this kinds of GCs (or even if they exist) - it just seems like common sense to me.

Is it a good idea to run the garbage collector in a separate thread/process from the actual program?

Well, it depends. It should be a good idea to keep it in another thread, but what you get by this depends - do you want to keep you GC simple ("stop the world" - all other threads are paused so it pretty much doesn't matter in which thread we execute our GC, but its nicer if it has it's own thread), or do you have special requirements (eq. your threads are real time and should not be stopped for a long while, then you'll have a different GC thread and use a realtime/incremental GC algorithm).

It all depends on what you need, but whatever you do, remember - keep it as simple as possible.

And I almost forgot, there are some great alternatives to writing your own GC from scratch - eg. see LLVM, they claim "leveraging these tools, it is straightforward to emit type-accurate stack maps for your runtime in as little as ~100 lines of C++ code." (precompiled code only, no code JIT generation support as of jet). Also, you might take a look at some of the code for some java VMs (eg. phoneME Advanced VM (CVM), and kaffe are pretty readable as far as I remember).

Disclaimer: I once implemented (as a student project) a precise, stop the world, tag-less, mark&sweep GC - everything written here is a result of what I can remember from the experience and should be correct, but it may not be "best practice". Corrections are welcome.

Hrvoje Prgeša
+2  A: 

Why do you want to do your own GC? Or are you just trying to learn about GC in general? The Jones and Lin book Hrvoje recommends is quite good, and the Wikipedia article doesn't look bad.

Java after 1.4.2 has provided an incremental hybrid GC that doesn't force a "stop the world" halt; we needed to do that to deal with device oriented Java in, eg, the Java ME world. There' a pretty good extended paper on it on the Sun java website.

Charlie Martin
I'm going to implement a smalltalk dialect that runs on x86. I've now hit the memory management problem and I thought it was the simplest to take care by myself about it.
Cheery
+1  A: 

Here is the blog for Maoni Stephens. She is the current main developer for the .Net GC.

Here is a post on concurrent GC which talks about how the .Net GC works when in that case.

Steve Steiner
+2  A: 

You have to collect once per address-space. Multiple threads run in the same address-space, so there needs to be one GC which takes care of that. For a program that works by spawning different processes each of which runs in its own address-space, there can be a GC per process.

In the multi-threaded case, it makes sense to run GC as another thread. That way, you can fiddle with the priority of that thread to ensure a mostly-smooth operation of the whole program. For a single-threaded process, it's easiest to have a GC which hooks into the 'normal' memory-management routines (allocation, in particular), but you can have two threads - one thread for the original process and a GC thread - to, again, ensure semi-smooth performance.

The stop-the-world-collectors are the simplest - mark-and-sweep, mark-and-compact, stop-and-copy, you name it. A GC which hooks into memory management routines in a single-threaded program is stop-the-world by nature. In the multi-threaded case, the GC thread can be given special privileges by the thread scheduler (make it uninterruptible once it decides to run), which allows you to run a stop-the-world GC from such a privileged thread.

If the GC could be interrupted by the mutator (ie, the rest of the program), in particular because it is just another thread which gets no special treatment at all, you'll need a GC which can handle the interference of the mutator. In this case, you're looking at an incremental GC. An IGC can be used in a single-threading, hook-into-memory-management-routines setup, where it may be interrupted ie by a timeout to ensure somewhat-smooth operation of the whole system; it can also be used in a multi-threaded system, where it simply competes with other threads for running time.

How you could find the stack(s) of a program or of all threads of a program I can't tell you, but the layout of these structures should be documented for every operating system. It might make sense to grab the Boehm GC and scan the sources for hints.

While Jones/Lins is being shipped to you, spend some time on http://www.memorymanagement.org/. For yet more info, as Charly Martin said above, the folks at Sun have done some amazing research and development in the area of garbage collection, just like the members and associates of the IBM Jikes RVM team.

Edit: After reading your comment to Charlie Martin, let me give you a much terser advice: Hook up the Boem GC to your system and be done with it. It is easy to write a garbage collector. It is nigh impossible to write a garbage collector that is correct, efficient, fast, well-behaved, well tuned and robust. Use an existing GC and move on to the interesting parts of your project. Don't get stuck with the GC, or, worse, have a badly implemented GC which will annoy you to no end.

doppelfish
if it's near impossible to write such garbage collector then neither Boehm GC is such. I'd have used boehm already if I wanted to do so. I'll try get along with zero depedencies right now.
Cheery
The Boehm GC didn't get as good as it is in a few days or weeks. And it's written by an expert of the field. Also note that both SUN and IBM have a whole group of specialists devoted to the GC of their JVMs.
doppelfish