ansaurus

Question

JVM crashes under stress on RHEL 5.2

Answer 1

+3 A:

A few ideas:

Use a different JDK, Tomcat and/or OS version
Slightly modify test parameters, e.g. 25 threads at 7.2 M pageviews/day
Monitor or profile memory usage
Debug or tune the Garbage Collector
Run static and dynamic analysis

kiwicptn 2010-02-24 03:19:32

Answer 2

+1 A:

Does your memory grow over time? If so, I suggest changing the memory limits lower to see if the system is failing more frequently when the memory is exhausted.

Can you reproduce the problem faster if:

You decrease the memory availble to the JVM?
You decrease the available system resources (i.e. drain system memory so JVM does not have enough)
You change your use cases to a simpler model?

One of the main strategies that I have used is to determine which use case is causing the problem. It might be a generic issue, or it might be use case specific. Try logging the start and stopping of use cases to see if you can determine which use cases are more likely to cause the problem. If you partition your use cases in half, see which half fails the fastest. That is likely to be a more frequent cause of the failure. Naturally, running a few trials of each configuration will increase the accuracy of your measurements.

I have also been known to either change the server to do little work or loop on the work that the server is doing. One makes your application code work a lot harder, the other makes the web server and application server work a lot harder.

Good luck, Jacob

TheJacobTaylor 2010-02-24 03:26:47

Looking at your trace, system memory should not be the issue in this case. Are there any messages in the system log? Also, if I am reading it right, it looks like you might have a rather high number of threads running. There are a ton of threads waiting for available CPU at any given time. I would expect faster average response times with a smaller number of threads.

TheJacobTaylor 2010-02-24 03:32:26

Answer 3

+1 A:

Try switching your servlet container from Tomcat to Jetty http://jetty.codehaus.org/jetty/.

crowne 2010-02-24 21:01:57

To see whether the JVM will still crash? Or for completely migrating to jetty?

cherouvim 2010-02-24 21:03:47

I would go for completely migrating to Jetty, just because I like what I've seen from Jetty in the past.However the latest comparisons that I've just googled, seem to show that performance wise Jetty-6 vs Tomcat-6 are fairly equal, although Jetty does come across as having a lighter memory footprint.From a more methodical approach, long as your application is standards compliant the migration shouldn't be too tough, and then you may be able to eliminate the container as the root cause or verify your application as the root cause. Good Luck.

crowne 2010-02-25 20:17:59

@crowne: thanks for the comment. My application is compliant with all major servers (tomcat, jboss, resin, jetty, glashfish) so migration is no problem. I'll definitely try out the stress test on jetty.

cherouvim 2010-02-26 16:56:38

Answer 4

+1 A:

If I was you, I'd do the following:

try slightly older Tomcat/JVM versions. You seem to be running the newest and greatest. I'd go down two versions or so, possibly try JRockit JVM.
do a thread dump (kill -3 java_pid) while the app is running to see the full stacks. Your current dump shows lots of threads being blocked - but it is not clear where do they block (I/O? some internal lock starvation? anything else?). I'd even maybe schedule kill -3 to be run every minute to compare any random thread dump with the one just before the crash.
I have seen cases where Linux JDK just dies whereas Windows JDK is able to gracefully catch an exception (was StackOverflowException then), so if you can modify the code, add "catch Throwable" somewhere in the top class. Just in case.
Play with GC tuning options. Turn concurrent GC on/off, adjust NewSize/MaxNewSize. And yes, this is not scientific - rather desperate need for working solution. More details here: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

Let us know how this was sorted out!

mindas 2010-02-24 22:43:24

Answer 5

+2 A:

Have you tried different hardware? It looks like you're using a 64-bit architecture. In my own experience 32-bit is faster and more stable. Perhaps there's a hardware issue somewhere too. Timing of "between 4-24 hours" is quite spread out to be just a software issue. Although you do say system log has no errors, so I could be way off. Still think its worth a try.

Daniil 2010-02-24 22:50:41

Trying out different hardware is not an option, but I'll try the 32bit jvm. Thanks

cherouvim 2010-02-27 09:15:15

Answer 6

+1 A:

Is it an option to go to the 32-bit JVM instead? I believe it is the most mature offering from Sun.

Thorbjørn Ravn Andersen 2010-02-27 09:10:21

Will try that out. Thanks.

cherouvim 2010-02-27 09:16:06

Answer 7

+2 A:

do you have compiler output? i.e. PrintCompilation (and if you're feeling particularly brave, LogCompilation).

I have debugged a case like this in the part by watching what the compiler is doing and, eventually (this took a long time until the light bulb moment), realising that my crash was caused by compilation of a particular method in the oracle jdbc driver.

so basically what I'd do is;

switch on PrintCompilation
since that doesn't give timestamps, write a script that watches that logfile (like a sleep every second and print new rows) and reports when methods were compiled (or not)
repeat the test
check the compiler output to see if the crash corresponds with compilation of some method
repeat a few more times to see if there is a pattern

If there is a discernable pattern then use .hotspot_compiler (or .hotspotrc) to make it stop compiling the offending method(s), repeat the test and see if it doesn't blow up. Obviously in your case this process could theoretically take months I'm afraid.

some references

for dealing with logcompilation output --> http://wikis.sun.com/display/HotSpotInternals/LogCompilation+tool
for info on .hotspot_compiler --> http://futuretask.blogspot.com/2005/01/java-tip-7-use-hotspotcompiler-file-to.html or http://blogs.sun.com/javawithjiva/entry/hotspotrc_and_hotspot_compiler
a really simple, quick & dirty script for watching the compiler output --> http://pastebin.com/Haqjdue9
note that this was written for solaris which always has bizarre options to utils compared to the gnu equivalents so no doubt easier ways to do this on other platforms or using different languages

The other thing I'd do is systematically change the gc algorithm you're using and check the crash times against gc activity (e.g. does it correlate with a young or old gc, what about TLABs?). Your dump indicates you're using parallel scavenge so try

the serial (young) collector (IIRC it can be combined with a parallel old)
ParNew + CMS
G1

if it doesn't recur with the different GC algos then you know it's down to that (and you have no fix but to change GC algo and/or walk back through older JVMs until you find a version of that algo that doesn't blow).

Cheers Matt

Matt 2010-02-28 21:01:45

Thanks for bringing PrintCompilation to my attention. Will definitely try this out.

cherouvim 2010-02-28 21:24:34

ansaurus

tags:

views:

answers:

JVM crashes under stress on RHEL 5.2

related questions