views:

392

answers:

7

AFAIK when a GC is doing its thing the VM blocks all running threads -- or at least when it is compacting the heap. Is this the case in modern implementions of the CLR and the JVM (Production versions as of January 2010) ? Please do not provide basic links on GC as I understand the rudimentary workings.

I assume global locking is the case as when compaction occurs references might be invalid during the move period, and it seems simplest just to lock the entire heap (i.e., indirectly by blocking all threads). I can imagine more robust mechanisms, but KISS often prevails.

If I am incorrect my question would be answered by a simple explanation of the strategy used to minimise blocking. If my assumption is correct please provide some insight on the following two questions:

  1. If this is indeed the behaviour, how do heavyweight enterprise engines like JBOSS and Glassfish maintain a consistantly high TPS rate ? I did some googling on JBOSS and I was expecting to find something on a APACHE like memory allocator suited for web processing.

  2. In the face of NUMA-esque architectures (potentially the near future) this sounds like a disaster unless the processes are CPU bound by thread and memory-allocation.

A: 

There are a number of GC algorithms available with Java, not all of which block all running threads. For example, you can use -XX:+UseConcMarkSweepGC which runs concurrently with the app (for collection of the tenured generation).

Matthew Wilson
They all block the running threads occasionally when they need to do global GC. The concurrent GC does try to do most of its work in a concurrent fashion though. <http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#cms.pauses>
Paul Wagland
a.k.a. (mostly) Concurrent Mark Sweep. The (mostly) is significant.
Tom Hawtin - tackline
There are fully concurrent GCs for other JVMs.
Jon Harrop
+4  A: 

The answer is that this depends on the garbage collection algorithms used. In some cases, you are correct that all threads are stopped during GC. In other cases, you are incorrect in that garbage collection proceeds while normal threads are running. To understand how GC's achieve that, you need a detailed understanding of the theory and terminology of garbage collectors, combined with an understanding of the specific collector. It is simply not amenable to a simple explanation.

Oh yes, and it is worth pointing out that many modern collectors don't have a compaction phase per-se. Rather they work by copying live objects to a new "space" and discarding the old "space" when they are done.

If you really want to understand how garbage collectors work, I recommend "Garbage Collection: Algorithms for Automatic Dynamic Memory Management" by Richard Jones.

EDIT: in response to the OP's comment ...

It depends. In the case of the Java 6 Concurrent Compiler, there are two pauses during the marking of the roots (including stacks), and then marking / copying of other objects proceeds in parallel. For other kinds of concurrent collector, read or write barriers are used while the collector is running to trap situations where the collector and application threads would otherwise interfere with each other. I don't have my copy of [Jones] here right now, but I also recall that it is possible to make the "stop the world" interval negligible ... at the cost of more expensive pointer operations and/or not collecting all garbage.

Stephen C
(+1) will definitely buy that book. A point I would like to know more on is what happens after the "copy to new and update procedure" ? presumeably all references are updated by blocking all threads ? Or is it an itterative procedure ?
Hassan Syed
As much as I know, even the most concurrent garbage collectors "stop the world" for some of their work, although most tasks run indeed concurrently. The CMS-collector in the JVM 6, however, does use all available CPUs during the stop-the-world phase.
edgar.holleis
@edgar Thank you, "CMS-collector" usefull search term -- I will look up the details when I have time. It seems it is as I thought -- there is no getting around the "stop the world" part. It would be interesting to have some hypothesis' or some yard-stick numbers on how these global lock phases effect the TPS rates of serious production servers (like the ones I mentioned in the question).
Hassan Syed
About TPS: You get it all wrong! You can generally trade latency (measured in ms, lower is better) against throughput (measured in TPS, higher is better). Low latency algorithms generally have lower throughput than the alternative: high throughput algorithms that generally also introduce higher latencies. If you're primarily worried about TPS, typical for server-loads, stop worrying about the length of the stop-the-world phase. It will be comparatively long, but not so long as to impact your TPS.
edgar.holleis
@Hassan - @edgar is right. The way to maximize TPS is to use a GC that has the lowest total overheads, not the one that minimizes pause times.
Stephen C
@edgar and @stephen that puts things into perspective.
Hassan Syed
+1  A: 

You are correct that the garbage collector will have to pause all the application threads. This pause time can be reduduced with the sun JVM by using the concurrent collector which preforms some of the work without stopping the application, but it stll has to pause the application threads.

See here http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc and here http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#cms for details on how the sun JVM manages garbage collection in the latest JVMs.

For web applications I don't think this is an issue. As the user requests should complete within a small amount of time < 1s any temporary objects allocated to service the request should not exit the young generation (providing it is sized appropriately) where they are cleaned up very efficiently. Other data with longer lifecycles such as user sessions will hang around longer and can impact the time spent on major GC events.

On high TPS applications a common strategy is to run multiple instances of the application server either on the same or separate hardware using session affinity and load ballancing. By doing this the individual heap size per JVM is kept smaller which reduced the pause times for GC when performing a major collection. In general the database becomes the bottle neck rather than the application or JVM.

The closest you might find to the concept of a web specific memory allocator in in J2EE is object/instance pooling that is performed by frameworks and application severs. For example in JBOSS you have EJB pools and database connection pools. However these objects are usually pooled because of thier high creation cost rather than the garbage collection overhead.

Aaron
(+1) good discussion -- I distill that I need to look at the concurrent `minor cycle` collector to see what it's effects are. The assumption is that the TPS characteristics of a web-server highly depend on a requests allocations remaining as young generation objects.
Hassan Syed
@Hassan Yes the paragraph about young generation is an assumption and will depend on your application, although you can tune size of the generations in the JVM. The point was you should not worry too much about the temporary objects created while processing request as the current generational garbage collectors are very good at cleaning these up. Running multiple JVMs can help with the longer lived objects such as user sessions as they can be distributed between the JVMs meaning each JVM has less to check during a major collection.
Aaron
-1: GCs do not have to pause all threads (aka "stop the world").
Jon Harrop
A: 

Current state of the art garbage collection for Java still involves occasional "stop the world" pauses. The G1 GC introduced on Java 6u14 does most of it's work concurrently, however, when memory is really low, and it needs to compact the heap, then it has to ensure that no-one messes with the heap underneath it. This requires that nothing else is allowed to proceed. To find out more about the G1 GC, look at the presentations from Sun.

Paul Wagland
-1: State-of-the-art garbage collection for Java has been fully concurrent for years now. There are *hard* real-time JVMs out there...
Jon Harrop
+1  A: 

I believe IBM have performed some research towards improving GC performance in multi-core systems which includes work on reducing or eliminating the 'everything stop' issue.

E.g. see: A Parallel, Incremental and Concurrent GC for Servers(pdf)

Or google something like "concurrent garbage collection ibm"

locster
+1  A: 

Azul systems have a really amazing GC (and JVM): http://www.azulsystems.com/

Viktor Klang
I don't need a proprietary GC. I'm just trying to understand the effects of "stop the world" -- and preferably not from explanations from marketing literature.
Hassan Syed
But they have really gotten far to avoid Stop the world.Stop the world is a Bad thing(tm)
Viktor Klang
Especially if you are a company that sells 864-core Java machines as Azul does.
Jörg W Mittag
A: 

AFAIK when a GC is doing its thing the VM blocks all running threads -- or at least when it is compacting the heap. Is this the case in modern implementions of the CLR and the JVM (Production versions as of January 2010) ?

Both Sun's Hotspot JVM and Microsoft's CLR have concurrent GCs that stop-the-world only for short phases (to get a self-consistent snapshot of the global roots from which all live data are reachable) and not for entire collection cycles. I'm not sure about their implementations of compaction but that is a very rare occurrence.

If this is indeed the behaviour, how do heavyweight enterprise engines like JBOSS and Glassfish maintain a consistantly high TPS rate?

The latency of those engines is orders of magnitude longer than the time taken to stop the world. Also, latencies are quoted as, for example, 95th percentile meaning that the latency will only be below the quoted time span 95% of the time. So compactions are unlikely to affect quoted latencies.

Jon Harrop