ansaurus

Question

Tri-Color Incremental Updating GC: Does it need to scan each stack twice?

Answer 1

+1 A:

A bit of terminology:

Let me give some names so that explanations are clearer.

A variable is any slot for data, which may contain a pointer and may change over time. This includes global variables, local variables, CPU registers and fields in allocated objects.

In a tricolor incremental or concurrent GC, there are three types of variables:

the true roots, which are always accessible (CPU registers, global variables);
the fast variables, which are scanned in a stop-the-world fashion;
the slow variables, which are handled with the colors. Slow variables are fields in colored objects.

The "true roots" and "fast variables" will be hereafter collectively called roots.

The application threads are called the mutators because they change the contents of variables.

With an incremental or concurrent GC, GC pauses occur regularly. The world is stopped (mutators are paused), and the roots are scanned. This scan reveals a number of references to colored objects. Object colors are adjusted accordingly (such white objects are made grey).

When the GC is incremental, some object scanning activity takes place: some grey objects are scanned (and painted black), greying referenced white objects. This activity (the "marking") is maintained for some time, but not necessarily as long as there are grey objects. At some point, the marking stops and the world is awakened. The GC is called "incremental" because the GC cycle is performed in small increments, interleaved with mutator activity.

In a concurrent GC, scanning of grey objects occurs concurrently with mutator activity. The world is then awakened as soon as the roots have been scanned. With a concurrent GC, access barriers are a tad complex to implement because they must handle concurrent access from the GC thread; but at a conceptual level, this is not very different from an incremental GC. A concurrent GC can be viewed as an optimization over incremental GC, which takes advantage of the presence of multiple CPU cores (a concurrent GC has little advantage over an incremental GC when there is only one core).

Roots need not be protected by an access barrier, since they are scanned with the world stopped. The GC mark phase ends when the following conditions are simultaneously met:

the roots have just been scanned;
all objects are either black or white, but not grey.

so this situation can occur only during a pause. At that point, the sweep phase begins, during which white objects are released. The sweep can be done incrementally or concurrently; objects created during the sweep are immediately painted black. When the sweep is finished, a new GC mark phase can take place: objects (which are all black at that point) are all repainted white (this is done atomically by simply changing the way color bits are interpreted).

Variable Classification:

With that being said, I can now answer to your question. With the description above, the question becomes: what are the roots ? This is actually up to the implementation; there are several possibilities, and trade-offs.

True roots must always be scanned; true roots are the CPU register contents and the global variables. Note that the stacks are not true roots; only the current stack frame pointer is.

Since fast variables are accessed without barriers, it is customary to make stack frames fast variables (i.e. roots as well). This is because while write accesses are rare system-wide, they are quite common in the local variables. It has been measured (on some Lisp programs) that about 99% of writes (of a pointer value) have a local variable as target.

Fast variables are often extended even further, in the case of a generational GC: the "young generation" consists in a special allocation area for new objects, limited in length, and scanned as fast variables. The bright side of fast variables is fast access (hence the name); the downside is that all these fast variables may be scanned only during a pause (the world is stopped). There is a trade-off on the size of the fast variables, which often translates to a limit on the young generation size. A larger young generation promotes average performance (by reducing the number of access barriers) at the cost of longer pauses.

At the other extreme, you may have no fast variable at all, and no root but the true roots. The stack frames are then handled as objects, each with their own color. Pauses are then minimal (a mere snapshot of a dozen register) but barriers must be used even for access to local variables. This is expensive, but has some advantages:

Hard guarantees on pause times can be made. This is difficult if stack frames are roots, because each new thread has its own stack, so the roots total size may grow up to arbitrary amounts as new threads are launched. If only CPU registers and global variables (no more than a few dozens in typical cases, and their number is known at compilation time) are roots, then pauses can be kept very short.
This allows for dynamic allocation of stack frames in the heap. This is needed if you play with co-routines and continuations, as with Scheme's call/cc primitive. In such a case, frames are no longer handled as a pure "stack". Proper handling of continuations in a GC-aware language mostly requires that function frames be allocated dynamically.

It is possible to make stack frames non-root while keeping a young generation as root. Guarantees on pause times can still be made (depending on the young generation size, which is fixed) and some trickery can be applied to make sure that stack frames are in the young generation when their function is active. This can ensure barrier-free access to local variables. None of this is really free, but it can be made efficient enough for most purposes.

Another Conceptual View:

Another way to view root-handling is the following: roots are the variables for which the tricolor rule (no black-to-white pointer) is not maintained at all times; these variables are allowed to be mutated without constraint. But they must be brought back in line regularly, by stopping the world and scanning them.

In practice, the mutators are racing with the GC. Mutators create new objects, and point to them; each pause creates new grey objects. In a concurrent or incremental GC, if you let the mutators play with roots for too long, then each pause may create a big batch of new grey objects. In the worst case, the GC cannot scan objects fast enough to keep up with the rate of grey object creation. This is an issue because white objects can be released only during the sweep phase, which is reached only if at some point the GC may complete its marking. A usual implementation strategy for an incremental GC is to scan grey objects, during each pause, for a total size which is proportional to the total size of roots. Thus, pause time remains bounded by the roots total size, and if the proportionality factor is well balanced then it can be guaranteed that the GC will ultimately terminate is marking phase and enter the sweeping phase.

In a concurrent GC, things are a bit more complex, because the mutators roam freely in the wild. A possible implementation would make a little bit of incremental marking while the world is still stopped.

Bibliography:

Garbage Collection: Algorithms for Automatic Dynamic Memory Management: a must-read book on garbage collection.

Thomas Pornin 2010-03-02 18:22:05

Not inexpensive that book ;-) But looks like one should read it. There are way too little "easy to understand" description of GC algorithms on the net - BTW, checked this as accepted reply because it basically answers my question. Short answer: GC cylce terminates only during a stopped world and after the stack has been scanned again.

Mecki 2010-03-02 18:58:46

ansaurus

tags:

views:

answers:

Tri-Color Incremental Updating GC: Does it need to scan each stack twice?

How a Tri-Color GC Works

The Problem

The Question

related questions