views:

104

answers:

2

We were having a problem with our Tomcat jvm blowing up and giving us an hs_* dump at random times but always in the same spot, that wasn't very informative other than saying we had an EXCEPTION_ACCESS_VIOLATION Commenting out various parts of the java that called particular jni functions just made it blow consistently in another spot.

By changing our jvm options from: set PAF_OPTS=-Xms1024m -Xmx32000m -server -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC -XX:+UseCompressedOops -Djava.library.path="%CATALINA_HOME%"\jni -Dcom.sun.management.jmxremote

TO set PAF_OPTS=-Xms1024m -Xmx32000m -server -XX:+DisableExplicitGC -XX:+UseCompressedOops -Djava.library.path="%CATALINA_HOME%"\jni -Dcom.sun.management.jmxremote

The problem went away. The solution does not give me a warm and fuzzy however and am wondering anyone might understand what's going on under the covers here.

Environment: jdk1.6, 64 bit OS and Java, Tomcat, Windows

A: 

Most likely you have heap / stack corruption due to your use of JNI.

It looks like you simplified your GC options, relying more on the default options; the XX options are unsupported and might be buggy.

In your working version you don't have parallel GC enabled, so it will all be single threaded; if you have any finalizers which call JNI to free native memory, then perhaps they are not thread safe?

What does the hs perf log say the java stack looked like. I've seen some app servers, which delegate to native SSL libraries have issues when garbage collecting a socket. Sometimes you can guess at the problem if you know what consistently blows.

Have you checked for bugfixes since your version of the JDK?

Does your JNI code allocate or free any memory? (it might be a msvcr incompatibility between JVM's MSVCR version and your MSVCR version).

Justin
Free's were all matched with malloc's, the jni code is straightforward. We're using jdk 1.6.18, I believe 19 is out but the release notes show no signs of this problem. Ergonomics is supposed to determine the gc and I from what I've read, the parallel GC will be used on a 64 bit machine with 8 or 16 cpus just by the jvm detecting it.I do believe the stack is getting trashed some how because this is printed in the dump:[error occurred during error reporting (printing native stack), id 0xc0000005]Other possibilities are how our dll's are compiled. They are not compiled with the dynmc flg
jim hale
A: 

It could also be that you are hanging on to references to java objects in your JNI code without adding a NewGlobalRef/NewLocalRef call. If the GC tidied up your objects and you referenced them from JNI code the JNI would core dump.

Take a look at: http://java.sun.com/j2se/1.4.2/docs/guide/jni/spec/functions.html#wp16270

Peter Smith