I'm reading Steve Yegge's "Dynamic Languages Strike Back" talk, and in it he sort of criticizes mark-and-sweep GCs (about 5-10 percent through that link, the "Pigs attempt's to fly" slide) What's wrong with them?
He's contrasting it to mark-compact:
Generational garbage collectors is the best answer I've got for that, because it reduces the pauses, and frankly, the garbage collectors for all the [new] dynamic languages today are crap. They're mark-and-sweep, or they're reference counted.
Plain mark & sweep GCs aren't so good because they have a problem of heap fragmentation. With high allocation levels common of GC-enabled languages, this usually becomes an issue faster than it does in e.g. C++, where a lot of objects just live on the stack.
That said, mark-compact is really mark & sweep with compacting tackled on it, so the terminology could be better. Non-compacting collectors are normally called "conservative" to distinguish them.
Here's the context of the quote:
Generational garbage collectors is the best answer I've got for that, because it reduces the pauses, and frankly, the garbage collectors for all the [new] dynamic languages today are crap. They're mark-and-sweep, or they're reference counted.
From the quote, he appears to be talking about fairly primitive GCs which aren't generational. Generational GCs can still be mark and sweep, but they have a lot less to mark most of the time, which makes them a lot faster than "mark and sweep the world every time".
Assuming that's what he meant, I agree - but he could have put it more clearly. Bear in mind that this was a talk rather than a doctoral thesis though - coming up with the clearest possible way of expressing yourself "on the hoof" is kinda tricky :)
Here's a high-level bullet point comparison of the various techniques mentioned in the referenced quotation (plus "mark-and-compact" ... which is slightly different to generational collection.)
The properties of reference counting collection are:
- PRO - garbage is reclaimed immediately (apart from cycles)
- PRO - garbage collection pauses are smaller, and minimal if you can defer updating the "free space" data structure.
- CON - reference counts need to be adjusted on most pointer write operation
- CON - free space is never compacted
- CON - because free space is not compacted, a "free space" data structure must be maintained which increases allocation and deallocation costs.
- CON - cyclic garbage is not collected, unless the application breaks the cycle by hand.
- CON - updating reference counts in a multi-threaded app is extra expensive.
For classic mark-and-sweep:
- PRO - no pointer write overhead
- PRO - cyclic data is collected
- PRO - storage management concurrency bottlenecks can be avoided (apart from GC)
- CON - stop-the-world garbage collection
- CON - free space is never compacted
- CON - because free space is not compacted, a "free space" data structure must be maintained which increases allocation and deallocation costs.
Classical mark-and-sweep is sometimes modified so that the sweep phase compacts the free space by "sliding" non-garbage objects. This is fairly complicated but:
- PRO - no pointer write overhead
- PRO - cyclic data is collected
- PRO - storage management concurrency bottlenecks can be easily avoided (apart from GC)
- CON - stop-the-world garbage collection
- PRO - free space is compacted, so allocation is cheap
- CON - the compact phase is rather expensive
Modern collectors (including generational collectors) are based on mark-and-copy. The idea is that the collector traces objects in a "from space" copying them to a "to space". When it is done, the "to space" has a contiguous chunk of free space at the end which can be used for allocating new objects. The old "from space" is put on one side for the next time the garbage collector runs. A generational collector is one where there are multiple spaces, that are collected at different rates. (This is a drastic over-simplification.)
- PRO - no pointer write overhead
- PRO - cyclic data is collected
- PRO - storage management concurrency bottlenecks can be easily avoided (apart from GC)
- CON - stop-the-world garbage collection, though this can be mitigated at the cost of some runtime overheads (e.g. using write barriers)
- PRO - with generational collectors, you usually GC just part of the heap
- PRO - smaller GC pauses (most of the time) because GC rate is faster and because of generations
- PRO - free space is compacted, so allocation is cheap
- PRO - compaction comes more cheaply than with a sliding compacter
- CON - you need to reserve an extra object space for the collector.