To add to the debate here.
There are known issues with garbage collection, and understanding them helps understanding why there is none in C++.
1. Performance ?
The first complaint is often about performance, but most people don't really realize what they are talking about. As illustrated by Martin Beckett
the problem may not be performance per se, but the predictability of performance.
There are currently 2 families of GC that are widely deployed:
- Mark-And-Sweep kind
- Reference-Counting kind
The Mark And Sweep
is faster (less impact on overall performance) but it suffers from a "freeze the world" syndrom: ie when the GC kicks in, everything else is stopped until the GC has made its cleanup. If you wish to build a server that answers in a few milliseconds... some transactions will not live up to your expectations :)
The problem of Reference Counting
is different: reference-counting adds overhead, especially in Multi-Threading environments because you need to have an atomic count. Furthermore there is the problem of reference cycles so you need a clever algorithm to detect those cycles and eliminate them (generally implement by a "freeze the world" too, though less frequent). In general, as of today, this kind (even though normally more responsive or rather, freezing less often) is slower than the Mark And Sweep
.
I have a seen a paper by Eiffel implementers that were trying to implement a Reference Counting
Garbage Collector that would have a similar global performance to Mark And Sweep
without the "Freeze The World" aspect. It required a separate thread for the GC (typical). The algorithm was a bit frightening (at the end) but the paper made a good job of introducing the concepts one at a time and showing the evolution of the algorithm from the "simple" version to the full-fledged one. Recommended reading if only I could put my hands back on the PDF file...
2. Resources Acquisition Is Initialization
It's a common idiom in C++
that you will wrap the ownership of resources within an object to ensure that they are properly released. It's mostly used for memory since we don't have garbage collection, but it's also useful nonetheless for many other situations:
- locks (multi-thread, file handle, ...)
- connections (to a database, another server, ...)
The idea is to properly control the lifetime of the object:
- it should be alive as long as you need it
- it should be killed when you're done with it
The problem of GC is that if it helps with the former and ultimately guarantees that later... this "ultimate" may not be sufficient. If you release a lock, you'd really like that it be released now, so that it does not block any further calls!
Languages with GC have two work arounds:
- don't use GC when stack allocation is sufficient: it's normally for performance issues, but in our case it really helps since the scope defines the lifetime
using
construct... but it's explicit (weak) RAII while in C++ RAII is implicit so that the user CANNOT unwittingly make the error (by omitting the using
keyword)
3. Smart Pointers
Smart pointers often appear as a silver bullet to handle memory in C++
. Often times I have heard: we don't need GC after all, since we have smart pointers.
One could not be more wrong.
Smart pointers do help: auto_ptr
and unique_ptr
use RAII concepts, extremely useful indeed. They are so simple that you can write them by yourself quite easily.
When one need to share ownership however it gets more difficult: you might share among multiple threads and there are a few subtle issues with the handling of the count. Therefore, one naturally goes toward shared_ptr
.
It's great, that's what Boost for after all, but it's not a silver bullet. In fact, the main issue with shared_ptr
is that it emulates a GC implemented by Reference Counting
but you need to implement the cycle detection all by yourself... Urg
Of course there is this weak_ptr
thingy, but I have unfortunately already seen memory leaks despite the use of shared_ptr
because of those cycles... and when you are in a Multi Threaded environment, it's extremely difficult to detect!
4. What's the solution ?
There is no silver bullet, but as always, it's definitely feasible. In the absence of GC one need to be clear on ownership:
- prefer having a single owner at one given time, if possible
- if not, make sure that your class diagram does not have any cycle pertaining to ownership and break them with subtle application of
weak_ptr
So indeed, it would be great to have a GC... however it's no trivial issue. And in the mean time, we just need to roll up our sleeves.