views:

926

answers:

12

I'm having a hard time dealing with a Java garbage collection problem, and interpreting the logs.

My application requires that no GC takes longer than 2 seconds, and ideally less than 100ms.

Based on some previous advice I'm trying the following command line options:

 java -XX:MaxGCPauseMillis=100 -XX:NewRatio=9 -XX:+UseConcMarkSweepGC  -XX:+CMSIncrementalMode -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -server -Xmx9g -Xms9g

The application has a large amount of long-term stored objects, which are kept in a ConcurrentLinkedHashMap. I'm seeing occasional long pauses, in the worst case up to 10 seconds (this is the second last like of the GC logs below)!

Here is some of the output I'm getting:

16938.968: [GC 16938.968: [ParNew: 153343K->17022K(153344K), 7.8608580 secs] 6184328K->6122510K(9420160K) icms_dc=7 , 7.8614100 secs] [Times: user=0.63 sys=0.01, real=7.86 secs] 
16947.087: [GC 16947.087: [ParNew: 153342K->17022K(153344K), 7.2604030 secs] 6258830K->6198642K(9420160K) icms_dc=7 , 7.2609780 secs] [Times: user=0.44 sys=0.00, real=7.27 secs] 
16954.614: [GC 16954.614: [ParNew: 153342K->17024K(153344K), 8.4307620 secs] 6334962K->6274625K(9420160K) icms_dc=7 , 8.4313150 secs] [Times: user=0.62 sys=0.01, real=8.43 secs] 
16963.310: [GC 16963.310: [ParNew: 153344K->17023K(153344K), 6.2588760 secs] 6410945K->6350748K(9420160K) icms_dc=7 , 6.2594290 secs] [Times: user=0.48 sys=0.01, real=6.25 secs] 
16969.834: [GC 16969.834: [ParNew: 153343K->17022K(153344K), 6.0274280 secs] 6487068K->6425868K(9420160K) icms_dc=7 , 6.0279830 secs] [Times: user=0.50 sys=0.01, real=6.03 secs] 
16976.122: [GC 16976.123: [ParNew: 153342K->17022K(153344K), 11.7774620 secs] 6562188K->6503030K(9420160K) icms_dc=7 , 11.7780180 secs] [Times: user=0.43 sys=0.04, real=11.78 secs] 
16988.164: [GC 16988.164: [ParNew: 153342K->17024K(153344K), 10.9477920 secs] 6639350K->6579928K(9420160K) icms_dc=7 , 10.9483440 secs] [Times: user=0.37 sys=0.02, real=10.95 secs] 
16999.371: [GC 16999.372: [ParNew: 153344K->17023K(153344K), 9.8828360 secs] 6716248K->6655886K(9420160K) icms_dc=7 , 9.8833940 secs] [Times: user=0.42 sys=0.01, real=9.88 secs] 
17009.509: [GC 17009.509: [ParNew: 153343K->17023K(153344K), 5.0699960 secs] 6792206K->6727987K(9420160K) icms_dc=7 , 5.0705660 secs] [Times: user=0.55 sys=0.01, real=5.07 secs] 
17014.838: [GC 17014.838: [ParNew: 153343K->17023K(153344K), 6.6411750 secs] 6864307K->6790974K(9420160K) icms_dc=7 , 6.6417400 secs] [Times: user=0.37 sys=0.01, real=6.63 secs] 
17021.735: [GC 17021.735: [ParNew: 153343K->17024K(153344K), 8.0545970 secs] 6927294K->6856409K(9420160K) icms_dc=7 , 8.0551790 secs] [Times: user=0.34 sys=0.03, real=8.05 secs] 
17030.052: [GC 17030.053: [ParNew: 153344K->17023K(153344K), 7.9756730 secs] 6992729K->6922569K(9420160K) icms_dc=7 , 7.9762530 secs] [Times: user=0.34 sys=0.01, real=7.98 secs] 
17038.398: [GC 17038.398: [ParNew: 153343K->17022K(153344K), 12.9613300 secs] 7058889K->6990725K(9420160K) icms_dc=7 , 12.9618850 secs] [Times: user=0.39 sys=0.01, real=12.96 secs] 
17051.630: [GC 17051.630: [ParNew: 153342K->17022K(153344K), 6.8942910 secs] 7127045K->7059607K(9420160K) icms_dc=7 , 6.8948380 secs] [Times: user=0.56 sys=0.02, real=6.89 secs] 
17058.798: [GC 17058.798: [ParNew: 153342K->17024K(153344K), 10.0262190 secs] 7195927K->7126351K(9420160K) icms_dc=7 , 10.0267860 secs] [Times: user=0.37 sys=0.01, real=10.02 secs] 
17069.096: [GC 17069.096: [ParNew: 153344K->17023K(153344K), 10.0419500 secs] 7262671K->7195002K(9420160K) icms_dc=7 , 10.0425020 secs] [Times: user=0.40 sys=0.02, real=10.04 secs] 
17079.410: [GC 17079.410: [ParNew: 153343K->17022K(153344K), 13.5389040 secs] 7331322K->7264275K(9420160K) icms_dc=7 , 13.5394610 secs] [Times: user=0.30 sys=0.01, real=13.54 secs] 
17093.223: [GC 17093.224: [ParNew: 153342K->17023K(153344K), 10.5909450 secs] 7400595K->7330446K(9420160K) icms_dc=7 , 10.5915060 secs] [Times: user=0.33 sys=0.00, real=10.58 secs] 
17104.083: [GC 17104.084: [ParNew: 153343K->17024K(153344K), 5.8420210 secs] 7466766K->7392173K(9420160K) icms_dc=7 , 5.8425920 secs] [Times: user=0.57 sys=0.00, real=5.84 secs]

I've spent hours pouring over the various webpages that describe Java GC tuning, but none have really given me the ability to interpret the logs above and come up with a course of action. Any specific advice based on the logs I've provided would be greatly appreciated.

Update: Per a question below:

The machine has 16G of RAM, here is the info from top: Mem: 15483904k total, 15280084k used, 203820k free, 155684k buffers Swap: 2031608k total, 1347240k used, 684368k free, 3304044k cached

Its a different run, but here is the current top output for the process:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  1016 sanity   17   0 10.2g 6.5g 9464 S    1 44.2  10:24.32 java

Update 2: Some more detailed logging, this looked like it caused a 400ms delay:

{Heap before GC invocations=1331 (full 1):
 par new generation   total 153344K, used 153343K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
  eden space 136320K, 100% used [0x00002aaaae200000, 0x00002aaab6720000, 0x00002aaab6720000)
  from space 17024K,  99% used [0x00002aaab77c0000, 0x00002aaab885fff0, 0x00002aaab8860000)
  to   space 17024K,   0% used [0x00002aaab6720000, 0x00002aaab6720000, 0x00002aaab77c0000)
 concurrent mark-sweep generation total 7169664K, used 4258496K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
 concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
484.738: [GC 484.738: [ParNew: 153343K->17022K(153344K), 0.3950480 secs] 4411840K->4341689K(7323008K), 0.3954820 secs] [Times: user=0.49 sys=0.07, real=0.40 secs] 
Heap after GC invocations=1332 (full 1):
 par new generation   total 153344K, used 17022K [0x00002aaaae200000, 0x00002aaab8860000, 0x00002aaab8860000)
  eden space 136320K,   0% used [0x00002aaaae200000, 0x00002aaaae200000, 0x00002aaab6720000)
  from space 17024K,  99% used [0x00002aaab6720000, 0x00002aaab77bfb68, 0x00002aaab77c0000)
  to   space 17024K,   0% used [0x00002aaab77c0000, 0x00002aaab77c0000, 0x00002aaab8860000)
 concurrent mark-sweep generation total 7169664K, used 4324666K [0x00002aaab8860000, 0x00002aac6e200000, 0x00002aac6e200000)
 concurrent-mark-sweep perm gen total 21248K, used 13269K [0x00002aac6e200000, 0x00002aac6f6c0000, 0x00002aac73600000)
}
+1  A: 

Perhaps the performance limit of 200ms is too strict and you need to manage garbage collection on your own? have you tried this with a larger limit?

Ido
You may be right, I will try using a more forgiving target
sanity
+3  A: 

In general, it's tough to get GC tuning right when you require such a large heap.

That being said, most of GC hang time is caused by the memory copying of objects that survive across young garbage collections.

Is your ConcurrentLinkedHashMap initialized with all the memory-persistent objects all at once? Or does it slowly grow bigger as the application keeps running? If it's the latter, it may be difficult to cut down your GC hang times since there are objects that always survive. If it's the former, you will need to size your tenured generation with the size of your persistent objects + 20% or so, and make sure that your young gen is large enough to keep up with the transient objects that are created during the course of your application.

bajafresh4life
+1  A: 

From the times, it seems the GC actually doesn't run all the time (see the user times), so most of the time, it's waiting.

Just a wild guess: Doesn't it swap? How much memory does the machine have? How much of it does the java process get (resident set size)?

Edit: why is it waiting:

Look at this (from your transcript)

[Times: user=0.39 sys=0.01, real=12.96 secs]

This means that (I guess from the beginning to the end of GC), 12 (almost 13) seconds passed. Of these 13 seconds, .39 was spent running in user mode, .01 was spent running in kernel mode. If the time collection method isn't totally flawed (ie. the numbers really represent the running time of the GC process/thread), this means at least 12 seconds waiting.

jpalecek
Good questions, will update question with info
sanity
Can you go into more detail about why it would spend most of the time waiting?
sanity
Edited to explain that
jpalecek
Thanks for the additional explanation, but I guess the question is: what could it be waiting for? IO? Pulling data into RAM from swap?
sanity
Probably. You can tell that by checking the major fault count (top doesn't give that info, but atop does)
jpalecek
... ehm. Should be Probably swap.
jpalecek
A: 

If you have strict timing requirements, perhaps you should check into the Java Real-Time System.

RTSJ /Java RTS provides:

An API set, semantic Java VM enhancements, and JVM-to-OS layer modifications which allow Java developers to correctly reason about and control the temporal behavior of Java applications.

Michael Myers
A: 

Have you run your app through a profiler to see that what you think is happening with respect to memeory is what is actually happening?

One thing I do when I am looking into this is use the Netbeans profiler (though any profiler should give this to you) is look at the live bytes (the memory allocation) and see if the ones that have a large amount of allocated bytes and allocated objectd are what I am expecting and are coming from where I expect.

You can also probably use the profiler to look at the effect your tuning is having, run the app without any tuing args, and then add your tuning args and run it again and see what is going on with the memory.

TofuBeer
A: 

It can be very hard to tell without actually seeing, and in some cases profiling the code.

Have you implemented finalize() for any of your objects? That will cause a big GC penalty. It would also be interesting to see a test run with a heap of maybe 6 Gigs, if you get a disproportionate improvement in performance it would indicate that the GC is thrashing for memory.

patros
A: 

I think your attention might be a little misdirected.

Spend a little time in a profiler finding your biggest allocation hotspots. If there are just a few places in your code where the majority of your allocations occur, try using an object pool rather than always constructing new objects.

Collection classes and StringBuilders are great candidates for pooling. When you return them to the pool, call the collection.clear() or the stringbuilder.setLength(0) methods, so that they're ready for consumption when the next caller wants to retrieve them from the pool.

The best way to tune the GC is by creating fewer objects. There are lots of strategies for eliminating allocations, and pooling is just one of them (though one of my favorites).

benjismith
object pooling is actually discouraged by the sun GC people. The GC is specifically tuned for this case. The GC especially likes objects that don't change.
Paul de Vrieze
I should have read the question more closely, especially where he says the majority of his objects are long-lived. But I stick with my suggestion for cases where people are experiencing GC lag on short-lived objects. Even if Sun discourages it, my ten-years of experience profiling and tuning Java apps tells me that pooling can sometimes be a great solution with multiple benefits: not only does it reduce the burden on the GC, it can also significantly decrease time spent in constructors.
benjismith
A: 

Can you post/link to the code of the ConcurrentLinkedHashMap implementation? If this is the implementation I've posted, please open a ticket on the project page so we can debug it together. If not, knowing the details of your implementation would help to determine where the issue might lie.

Ben Manes
+1  A: 

9 GB JVM ! Never seen that before ! I think your 10 sec. pauses are quite normal. see this (maybe you've read it already...)

Tusc
+1  A: 

Turns out the problem was that the heap was getting swapped out to disk, and the delay was Java's GC having to wait for it to be swapped back in.

Solved (mostly) by setting Linux' "swappiness" parameter to 0.

sanity
A: 

I think you are having this UseConcMarkSweepGC and NewRatio bug. Since your new-space is no way near being one tenth of the -Jmx=9G. The bug includes a workaround (NewSize in absolute size).

Another flag that might be very important for you is CMSInitiatingOccupancyFraction. It is set at 92% in java6 and was 68% in java5. If the old-space grows larger the CMS threadpool will start to do their work. If you have CPU to spend, it is no danger to have a live-set that is above the initiating fraction.

It would be nice if you had included the GC stats after you fixed the memory paging problem.

Christian