views:

143

answers:

3

Recently I heard Kirk Pepperdine speak about changing garbage collectors for better performance -- but what exactly does that mean and what makes one garbage collector better or different than the other?

+2  A: 

Some collectors are better for throughput, others are better for response time. The difference is usually in how the collector chooses to pause the application. Some such as CMS use mutiple passes to triage the garbage before stopping the application. This triage can happen in a background thread while the application is running, and thus not interfere with your application as much as one that "stops the world" to do a GC.

Edit

Check out this document by sun. Also, about half way down there is a nice image showing the default mark-compact collector against the CMS collector. A picture is worth a thousand words, but the article is a good read too ;) Also worth reading is all the documents on the new G1 collector.

reccles
Thanks for the link.
tangens
+1  A: 

The basic problem is that the way that Java program sees memory (you call "new MyObject" and there it is, and when you are done with it you just forget about it) does not map very well to the underlying operating system and hardware.

The job of the garbage collector is to identify those memory areas which are not in use by an object, and "melt" them together to give a LARGE memory area from where new objects can be allocated. This is very vaguely worded in the Java specification HOW this is done, most likely in order to provide maximum flexibility for the designers of this important task.

Several approaches exist, with advantages and disadvantages. What you usually want is a garbage collector that can keep up in the background with the rate of objects being abandoned, as the only way for it to catch up is to stop the program while catching up. That gives really bad user experiences.

A typical trend for Java objects is that either they live for a very short time (current block or method) or a very long time. Modern garbage collectors deal with this by having multiple pools so that young objects are treated differnetly than old objects.

Thorbjørn Ravn Andersen
+3  A: 

You ask two questions:

What does it mean to change garbage collectors in Java for better performance?

This is a huge topic, and like some of the other responders, I urge you to do some reading. I recommend Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning from Sun. The information below mostly comes from there. The "turbo-charging" java article recommended in another answer is older.

In brief, one of the many options we have when running the JVM is to select a garbage collector, of which there are presently three:

  • The serial collector (selected with the -XX:+UseSerialGC option) - this uses a single thread to do all collection work, and everything waits while it happens.
  • The parallel collector (selected with the -XX:+UseParallelGC option) - this does minor collections (of the young generation) in parallel, but everything waits during the major collections.
  • The concurrent collector (selected with the -XX:+UseConcMarkSweepGC option) - this allows most collection operations to happen while the application is running.

What makes one garbage collector better than another?

Your application does. Each of the garbage collectors has a "sweet spot" - a range of application profiles for which it is the superior collector.

First, know that the VM is pretty good at selecting a collector for you, and as with most optimizations, you should not consider second-guessing it until you've identified that your application is not performing well, and that garbage collection is the likely culprit.

In that case, you have to ask these questions: 1) is your app running on a single-processor machine, or multi? 2) Are you more concerned with "minimizing pause time", or with "maximizing throughput"? That is, if you had to choose between the application never pausing but getting less work done overall, versus getting more work done overall, but pausing from time to time, which would you pick?

Roughly speaking, as a starting point:

  • On a Multi-processor machine, mostly concerned with minimizing pause time, you'd tend to use the Concurrent collector (consider enabling incremental mode)
  • On a Multi-processor machine, mostly concerned with maximizing throughput, you'd tend to use the Parallel collector (consider enabling parallel compaction)
  • On a Single-processor machine, with small datasets (up to roughly 100Mb), you'd tend to use the Serial collector
  • On a Single-processor machine, mostly concerned with maximizing throughput, you'd tend to use the Serial collector
  • On a Single-processor machine, mostly concerned with minimizing pause time, you'd tend to use the Concurrent collector (consider enabling incremental mode)

Again, though, the VM does a pretty good job of selecting a collector for you, and you're better off not overriding that unless and until you discover that it's not working well enough for your application.

CPerkins