views:

466

answers:

7

I'm interested in how garbage collection works. I've read up on how some work such as mark-and-sweep, stop-and-copy, generational GC, etc... I'd like to experiment with implementing some of these and comparing their behaviors. What's a good way to get started experimenting with my own? Ideally something in C, Java or Python (although the last two are themselves garbage-collected so it seems it'd be hard to use them...)

A: 

Implement your own JVM. Nothing fancy, just the basics. There are lots and lots of programs/compilers/languages which generate JVM code so you have plenty of material to test with.

JesperE
+2  A: 

the .NET runtime and Java runtime are now open source, so you can experiment with the runtime it self if you want to play around with a current support programming language. However if you wanted to do this yourself you would probably have to create your own runtime which has it's own language.

Nick Berardi
+12  A: 

Never played with it myself, but the one that always gets mentioned for use with C/C++ is Hans Boehm's.

Michael Burr
Just to add that this is considered *the* garbage collector of choice if you want to stay within C/C++ instead of using a VM ...
Remo.D
+1, I use it frequently when I'm working on something extremely complex where a double free() would spell disaster.
Tim Post
That's correct, it's also the GC that is being used by the gcc project.
none
A: 

Fun to play with, but garbage collection is a dark art. Not to make it work, but to make it work with the efficiency that the newest VMs do.

We're talking multi-stage and magic that makes allocations speed more comparable to stack allocations than malloc.

The whole eden concept rocks.

You might want to read some whitepapers on the techniques used.

Here's an article that seems to have a good overview (just from a quick google/scan)

http://www.devx.com/Java/Article/21977/0/page/1

Bill K
-1: "garbage collection is a dark art". Sorry but I think it is counter-productive to make statements like this when you can write a decent garbage collector in 100 lines of code.
Jon Harrop
@Jon Harrop Doing a simple GC is easy (exactly what I said in my answer), what's a dark art is making it work with the efficiency of the newer VMs. Do you really disagree with that? Have you looked into what the new VMs do?
Bill K
@Bill: I blogged this: http://flyingfrogblog.blogspot.com/2010/09/are-multicore-capable-garbage.html
Jon Harrop
@Jon I was not talking about concurrent either--that's not particularly hard. I was talking about multi-generational GC where Eden is filled, copied and refilled to reflect the short lifecycle of new objects, long term objects that hang around are promoted to a longer term storage and may finally be moved to an "old" generation where they are unlikely to move or be changed all through analyzing usage patterns. If you can do this in a few hundred lines or can honestly call it anything but a black art (due to the necessity of years of trial and error tweaking/analysis) I'd be very impressed
Bill K
@Bill: VM design is actually a hobby of mine. I have read hundreds of research papers on them and even implemented my own VM (HLVM). I am afraid that a lot of people are put off by this perception that it is a "dark art" when, in fact, you can easily create a competitively-performant VM by yourself these days. There is a lot of great literature out there now...
Jon Harrop
@Bill: Traditional generational collection does not have to be hard either. Firstly, it requires a moving collector so you must save and reload all locally-held references across GC safe points. Secondly, you allocate using pointer bumps into a nursery generation that is roughly the size of your L2 cache. When it is full, you traverse it and copy all reachable values into an old generation (you can just use `malloc` and `free` to manage the old generation) and sweep the entire nursery in O(1) by resetting the pointer (this is why it is so fast when a high proportion of values are short lived).
Jon Harrop
@Bill: Finally, you must handle the possibility of pointers from old back to young. They can be introduced when a reference is written into the old generation and is solved by injecting a write barrier. The simplest solution is to perform a nursery collection whenever the replacement reference would have referred to an object in the young generation. For better performance with imperative languages, keep a remembered set. You don't have to do anything as low-level as card marking to get decent performance.
Jon Harrop
A: 

Slava Pestov who developes the Factor programming language wrote a number of posts about his implementation of a garbage collector. You can find posts on it with this link:

http://factor-language.blogspot.com/search?q=garbage+collection

in particular start from the post on Sunday, September 24, 2006.

Jamie Love
+1  A: 

Parrot has multiple garbage collectors.

Max Lybbert
That link broke. It probably should be http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd09_gc.pod.html
DarenW
A: 

MMTk contains a large set of high performance garbage collectors. It includes:

  • Copying collectors
  • Tracing collectors
  • Reference counting collectors

It also has:

  • Stop the world collectors
  • Concurrent collectors

Since it is a research platform it has some advance collectors such as the generation reference counting collector.

Luke Quinane