views:

302

answers:

8

I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")

  • Why is that so?
  • Is that specific to the Sun JVM or does this result from a general garbage collection principle?
A: 

This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.

This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.

Marcelo Cantos
Most of the code I churn out could be called pathological. Some of it is downright sociopathic :-)
paxdiablo
+5  A: 

This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).

Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.

Clint Tseng
+1  A: 

The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.

In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).

Thilo
+4  A: 

There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.

Here is an example:

public String concatenate(int[] arr) { 
  StringBuilder sb = new StringBuilder();
  for(int i = 0; i < arr.length; ++i)
    sb.append(i > 0 ? "," : "").append(arr[i]);
  return sb.toString();
}

The sb object will become garbage as soon as the method returns.

By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.

Itay
+9  A: 

Most Java apps create Java objects and then discard them rather quickly eg. you create some objects in a method then once you exit the method all the object dies. Most apps behave this way and most people tend to code their apps this way. The Java heap is roughly broken up into 3 parts, permanent, old (long lived) generation, and young (short lived) generation. Young gen is further broken up into S1, S2 and eden. These are just heaps.

Most objects when they are created in young gen. The idea here is that since the mortality rate of objects are high, we quickly create them, use them and then discard them. Speed is of essence. As you create object, the young gen fills up until a minor GC occurs. In minor GC, all objects that are alive are copied over from eden and say S2 to S1. Then the 'pointer' is rested on eden and S2.

Every copy ages the object. By default, if an object survives 32 copies viz. 32 minor GC, then the GC figures that it is going to around a lot longer. So what it does is it tenure it to old gen. Old gen is just one big space. When the old gen fills up, a full GC happens in the old gen. Because there is no other space to copy to, the GC has to compact. It is a lot slower than minor GC.

You can tune the tenuring parameter with

java -XX:MaxTenuringThreshold=16 

if you know that you have lots of long lived objects. You can print the various age bucket of your app with

java -XX:-PrintTenuringDistribution
Chuk Lee
+1  A: 

All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the finalize() method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.

Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).

Just to compare: If you allocate an object in C/C++, you need to call free() plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.

Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.

Aaron Digulla
+1  A: 

Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.

The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.

A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".

Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.

Thomas Pornin
+3  A: 

(see above explanations for more general GC.. this answers WHY new is cheaper to GC than old).

The reason eden can be cleared faster is simple: the algorithm is proportional to the number of objects that will survive GC in the eden space, not proportional to the number of live objects in the whole heap. ie: if you have an average object death rate of 99% in eden (ie: 99% of objects do not survive GC, which is not abnormal), you only need to look at and copy that 1%. For "old" GC, all live objects in the full heap need to be marked/swept. That is significantly more expensive.

Trent Gray-Donald