views:

1582

answers:

4

EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).

First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:

http://stackoverflow.com/questions/2297920/jvm-outofmemory-error-death-spiral-not-memory-leak

It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).

I can reliably trigger a JVM to SIGSEGV using only valid Java code.

NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters may fix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).

I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).

Hardware error can be ruled out for:

  • this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])

  • same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)

  • same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])

  • other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)

  • machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)

Here's the reproducible-on-demand SIGSEGV:

... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Launch the app, feed it a "deluge of data", wait a few seconds...

Then, invariably, for 1.6.0_17:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)

or for 1.6.0_18:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted

(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)

The problem is that the log file contains proprietary information that cannot be shared.

Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.

Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?

What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?

Did any of you have success opening such a bug and then see it solved in a subsequent Java release?

Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?

+3  A: 

The problem is that the log file contains proprietary information that cannot be shared. Reproducing a "tiny test case" that reproduce the issue ain't realistic either

If you can't provide Sun with a reproducible test case, they won't even look at it. Chance are good that they will ignore it even if you do provide a usable test case. The bug submission process at Sun leaves a lot to be desired.

Should I report this and how?

If you can't come up with a reproducible test case, don't bother. If they can't reproduce the issue, what do you expect them to do?

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

Does it work on a different box with the same hardware and same version of Linux?

Kevin
I'm sure that buying support gets you a LOT more attention. How much, depends on the level you buy.
Thorbjørn Ravn Andersen
@Kevin: ah damn... I could dd my hd to another one and hence try with the exact same Linux kernel/configuration and JVMs to see if the SIGSEGV is also reproducible but what you're writing there is quite depressing. A test case would mean hundreds of Megabytes of data to send. Oh well, if it's reproducible on any hardware maybe I should just ship the harddisk or make a Bootable-CD that can reproduce the problem :) (I'm half-serious) What about the OpenJDK? Would things be different if I could reliably reproduce this under the OpenJDK 7 ?
Webinator
@WizardOfOdds : you say there's propriertary information in the log file. Could you write a parser or something to "banalize" this data, and then send your logfile to Sun ?
Valentin Rocher
A: 

The very first question you should ask yourself is:

  • Am I using an officially supported Linux distribution?

If not, switch to one that is.

If you are, then report it to Sun!

Thorbjørn Ravn Andersen
Webinator
Supported by the entity that has produced the JVM you are using. Sun does not say that their Java will run on any Linux distribution in existence, but they say that they "support" the distributions listed on http://java.sun.com/javase/6/webnotes/install/system-configurations.html (where "support" means even consider listening to bugreports). Debian is not there, but Ubuntu is. Use that instead.
Thorbjørn Ravn Andersen
@Throbjorn: Oh ok I see what you mean (thanks for the link too)... That said Ubuntu is actually Debian based :) Debian is the most highly respected distribution by sysadmins and powers a lot of the Real-World [TM] servers, I'm not switching to any other Linux distro ;) That said the issue is not the SIGSEGV (for I've got workarounds) but what to do with it... :)
Webinator
A: 

If it helps, the bug submission link in your crash report has this disclaimer:

In addition, Sun Microsystems respects your desire for privacy. Personal data collected from this program will not be sold, given or shared with organizations external to Sun. We will use this data for communications with you to clarify issues regarding the report you submitted and/or status of that report. The issues that you report may be made available to other JDC Members or Sun customers, however your personal data will be kept confidential. If you are not comfortable with the above conditions, please do not press the Submit button. If you have any questions, please refer to our Privacy Policy.

Personally, I would report it if it was feasible to hand over the code segment in question with logs, if the data is not too sensitive (perhaps data can be masked or obfuscated in logs?).

It's impossible for you to really judge if the bug is "important" or not for others unless you can know what actually causes it. Reporting it might be the first step in Sun's engineers finding out the cause of something serious.

matt b
@matt b: yup, was thinking about clearing the filenames in the hs_err_...log. I'll see if a Proguarded version also triggers the crash and then I may even sent the obfuscated .jar + data allowing to reproduce the issue. Still scratching my head on this.
Webinator
+1  A: 

I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:

-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m

-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

-XX:+UseParallelGC
-XX:-UseGCOverheadLimit

# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost

I still didn't double check (it is a production environment), but I think the error was due to two reasons:

1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but

2) in the wrong original setting somebody wrote

-XX:+HeapDumpOnOutOfMemoryError="/tmp"

and not

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).

Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.

glenti
@glenti: +1, cool, your first answer on SO was to one of my question :) Tried everything you suggested but it's still crashing. There's no sign of an OutOfMemoryError, I tried with a custom JLabel displaying the memory usage. Apparently no PermGen issue neither.
Webinator
@glenti: your post got me thinking... I'm using a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless)
Webinator