views:

326

answers:

2

We have a java process running as a daemon (under jsvc). Every several days it just stops doing any work; output to the logfile stops (it is pretty verbose, on 5-minute intervals) and it consumes no CPU or IO.

There are no exceptions logged in the logfile nor in syserr or sysout. The last log statement is just prior to a db commit being done, but there is no open connection on the db server (MySQL) and reviewing the code, there should always be additional log output after that, even if it had encountered an exception that was going to bubble up.

The most curious thing I find is that in the thread dump (included below), there's no thread in our code at all, and the main thread seems to have no context whatsoever:

"main" prio=10 tid=0x0000000000614000 nid=0x445d runnable [0x0000000000000000]
  java.lang.Thread.State: RUNNABLE

As noted earlier, this is a daemon process running using jsvc, but I don't know if that has anything to do with it (I can restructure the code to also allow running it directly, to test).

Any suggestions on what might be happening here?

Thanks... dwh

Full thread dump:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode):

"MySQL Statement Cancellation Timer" daemon prio=10 tid=0x00002aaaf81b8800 nid=0x447b in Object.wait() [0x00002aaaf6a22000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x00002aaab5556d50> (a java.util.TaskQueue)
 at java.lang.Object.wait(Object.java:485)
 at java.util.TimerThread.mainLoop(Timer.java:483)
 - locked <0x00002aaab5556d50> (a java.util.TaskQueue)
 at java.util.TimerThread.run(Timer.java:462)

"Low Memory Detector" daemon prio=10 tid=0x00000000006a4000 nid=0x4479 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00000000006a1000 nid=0x4477 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x000000000069d000 nid=0x4476 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x000000000069b000 nid=0x4465 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0000000000678800 nid=0x4464 in Object.wait() [0x00002aaaf61d6000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x00002aaab54a1cb8> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 - locked <0x00002aaab54a1cb8> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0000000000676800 nid=0x4463 in Object.wait() [0x00002aaaf60d5000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <0x00002aaab54a1cf0> (a java.lang.ref.Reference$Lock)
 at java.lang.Object.wait(Object.java:485)
 at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
 - locked <0x00002aaab54a1cf0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000000614000 nid=0x445d runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"VM Thread" prio=10 tid=0x0000000000670000 nid=0x4462 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000061e000 nid=0x445e runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000000620000 nid=0x445f runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000000622000 nid=0x4460 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000000623800 nid=0x4461 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00000000006a6800 nid=0x447a waiting on condition 

JNI global references: 797

Heap
 PSYoungGen      total 162944K, used 48388K [0x00002aaadff40000, 0x00002aaaf2ab0000, 0x00002aaaf5490000)
  eden space 102784K, 47% used [0x00002aaadff40000,0x00002aaae2e81170,0x00002aaae63a0000)
  from space 60160K, 0% used [0x00002aaaeb850000,0x00002aaaeb850000,0x00002aaaef310000)
  to   space 86720K, 0% used [0x00002aaae63a0000,0x00002aaae63a0000,0x00002aaaeb850000)
 PSOldGen        total 699072K, used 699072K [0x00002aaab5490000, 0x00002aaadff40000, 0x00002aaadff40000)
  object space 699072K, 100% used [0x00002aaab5490000,0x00002aaadff40000,0x00002aaadff40000)
 PSPermGen       total 21248K, used 9252K [0x00002aaab0090000, 0x00002aaab1550000, 0x00002aaab5490000)
  object space 21248K, 43% used [0x00002aaab0090000,0x00002aaab09993e8,0x00002aaab1550000)
+1  A: 

Not all Throwables are Exceptions. Does your error logging code catch Errors (OutOfMemoryError, StackOverflowError, etc)?

richj
A: 

Another couple of possibilities:

  • The exception might be being thrown on a worker thread that does not log exceptions. You can address this by using Thread.setDefaultUncaughtExceptionHandler(...).

  • The exception that is being thrown might override the Throwable.fillInStackTrace() method. (This is a long shot ... but apparently some people do this in a misguided attempt to prevent reverse engineering.)

Stephen C