views:

169

answers:

4

I was thinking about garbage collection on the way home, and I began wondering, why does the garbage collector totally freeze execution of a program? Personally I would have designed it to block any threads which try to allocate a new object, but threads which were running would be left alone. I can't imagine any situation where this would be a problem compared to how a garbage collector currently works.

+2  A: 

Because that's the only way it can assure that the refereces it is going to clean are not been used by anyone else.

If it didn´t freezed the execution, it could not assure that.

Kico Lobo
why? If an object is holding a reference when the collector scans it, then a reference is held, if it's not holding one when the scan comes along then no reference is held and there is no way it can get a reference back.
Martin
Why do you say "if it's not holding one when the scan comes along then no reference is held and there is no way it can get a reference back"? Look at Jon Skeet's code snippet: x doesn't hold any reference at first, but later it does.
Laurence Gonsalves
Indeed, Jon wrote that example after I wrote that comment, he is quite correct.
Martin
+7  A: 

In addition to what Kico Lobo said, Garbage Collectors can also move things around in memory.

Therefore, they don't just have to block threads that write to memory, but also threads that read from memory.

Which is every thread.

R. Bemrose
+12  A: 

Modern garbage collectors (in .NET and Java, anyway) don't actually "stop the world" - they do all kinds of clever things to collect concurrently.

However, you might want to consider a situation like this:

 object x = null;
 object y = new object();
 ...
 x = y;
 y = null;

Now, suppose the GC looks at x, then the lines below the ... run, and then the GC looks at y - it won't have seen any live objects... but the object should still be live.

Basically there needs to be a certain amount of pausing in order to get a consistent set of references. Then there's compaction, reference reassignment etc. However, it isn't nearly as bad as it used to be in terms of requiring everything to be stopped for the whole of the GC cycle. It does, however, get painful to think about :)

Jon Skeet
Ahh ok, as ever the clever chaps at Microsoft are ahead of me. I suspected there was a case like this, I just couldn't quite construct one. Do you know of any good resources (particularly papers which I can view freely) to read up about this?
Martin
Here's one about HotSpot's GC (somewhat old, admittedly): http://www.ibm.com/developerworks/java/library/j-jtp11253/
Jon Skeet
Also have a look at this: http://blogs.sun.com/theplanetarium/entry/java_vm_trying_a_new and this: http://blogs.sun.com/jonthecollector/entry/our_collectors
Jon Skeet
Thankyou, these should keep me entertained for a while ;)
Martin
+2  A: 

Most GCs stop execution because objects can move in memory during a collection cycle (at least with most reasonably recent designs). That means either reading or writing almost any object at the wrong time can cause a problem.

There are collectors that have been designed around the idea of just blocking reads (or writes) to the specific parts of memory being modified at a given time, so as long as execution only uses objects that aren't (currently) being moved around, it can proceed unhindered. The problem is that most typical hardware doesn't provide efficient support for this, so even though they work in principle, they're fairly inefficient in practice. There has been at least one attempt at adapting that type of algorithm to use the write protection available in a typical paging unit, but I'm not aware of its having been used for much other than research and experimentation.

The primary alternative is to make the collector incremental -- i.e. have it do only a small amount of work at a time, so even though other execution gets stopped, it only has to stop for a little while at any given time.

With multi-core machines becoming so common, however, I'd expect to see more work put into garbage collection algorithms that can run in parallel with other execution. Up until recently, the primary emphasis was on minimizing the total time/effort spent on garbage collection. The growing number of cores available is likely to (often) mean that doing more total work in garbage collection may be easily justified, if doing so allows the mainstream of the code to run with fewer hindrances.

Edit: You might want to read Paul Wilson's Survey of Uniprocessor Garbage Collection Techniques. This isn't definitive (especially any more, given its age), but it's at least a reasonable starting point.

Jerry Coffin