tags:

views:

160

answers:

11

So OOME is of the class of errors which generally you shouldn't recover from. But if it is buried in a thread, or someone catches it, it is possible for an application to get in a state from which it isn't exiting, but isn't useful. Any suggestions in how to prevent this even in the face of using libraries which may foolishly try to catch Throwable or Error/OOME?

A: 

If it's caught, there's nothing really you can do -- whatever is catching it has the opportunity to recover from the exception (maybe by freeing up memory), which is really the point of it in the first place. Why not edit the source code and remake if you have access to it? That's what I'd do :) Unless there's some signing issue. As the other poster suggested (and something I was thinking about initially) AOP on every method call would be the only way to go, but this would be heavy handed and would also mess up the 'good' OOME catchers.

Chris Dennett
@Chris - actually, the OP's point is that it is generally unsafe to try to recover from an OOME.
Stephen C
A: 

Only thing I can think of is using AOP to wrap every single method (beware to rule out java.*) with a try-catch for OOME and if so, log something and call System.exit() in the catch block.

Not a solution I'd call elegant, though...

brunodecarvalho
Sounds painful ;) Also, there's no way to tell the bits of code that have good OOME catchers and those that have bad ones..
Chris Dennett
It doesn't add that much overhead. Problem is if the try-catch is in the middle of some obscure method, there's nothing you can do. If, however, the exception occurs at some lower level of the library/app and some higher method catches it, AOP would work here. Like I said, can't think of anything else that works for this case :)
brunodecarvalho
"... nothing you can do." Well, actually there might be... if you know the exact method where this happens, again AOP to the rescue: configure a pointcut to completely divert the flow of calling that method (most of the times this is impossible though, especially on instance methods that require context or use instance vars).
brunodecarvalho
This would indeed be painful, but that's mainly because it's not quite the right AOP solution. You can write pointcuts to intercept catch-blocks, meaning your aspect only has to wrap catch(Throwable) and catch(OOME). --- But regardless, I think the ROI for this kind of effort is poor.
RonU
+2  A: 

If some piece of code in your application decides that it wants to try to catch OOMEs and attempt to recover, there is (unfortunately) nothing you that you can do to stop it ... apart from AOP heroics that are probably impractical, and definitely are bad for your application's performance and maintainability.

Basically, you have to trust other developers not to do stupid things. Other stupid things that you probably shouldn't try to defend against include:

  • calling System.exit(),
  • calling Thread.stop() and friends,
  • leaking open streams, database connections and so on,
  • spawning lots of threads,
  • randomly squashing (i.e. catching and ignoring) exception,
  • etc.

In practice, the way to pick up problems like this in code written by other people is to use code quality checkers, and perform code reviews.

For those who don't already know this, there are a number of reason why it is a bad idea to try to recover from an OOME:

  1. The OOME might have been thrown while the current thread was in the middle of updating some important data structure. In the general case, the code that catches this OOME has no way of knowing this, and if it tries to "recover" there is a risk that the application will continue with a damages data structure.

  2. If the application is multi-threaded there is a chance that OOMEs might have been thrown on other threads as well, making recovery even harder.

  3. Even if the application can recover without leaving data structures in an inconsistent state, the recovery may just cause the application to limp along for a few seconds more and then OOME again.

  4. Unless you set the JVM options appropriately, a JVM that has almost run out of memory tends to spend a lot of time garbage collecting in a vain attempt to keep doing. Attempting to recover from OOMEs is likely to prolong the agony.

Recovering from an OOME does nothing to address the root cause which is typically, a memory leak, a poorly designed (i.e. memory wasteful) data structure, and/or launching the application with a heap that is too small.

Stephen C
I think the OP has a clear notion that catching OOME is a bad idea; he probably just ran into a case where this happens and wants to make the whole system shutdown rather than just continue crippled. Always good to highlight these points, anyhow :)
brunodecarvalho
well it can be more innocent - thread t1 gets an OOME - and dies. Thread t2 - which doesn't do anything requiring heap keeps running, preventing the JVM from exiting.
Michael Neale
@bruno - 5 minutes with `find` and `grep` should be sufficient to find the offending code. (I'm assuming he has source code access. If he doesn't it might take a bit longer ... using a decompiler.)
Stephen C
Yes - exactly - I do NOT want to recover from OOMEs - its just that I have noticed that people sometimes, deliberately or other, in various libraries, will make mistakes which prevent OOMEs from percolating up. In multi threaded it is easy to do (I am being nice here !).
Michael Neale
@Michael - yes, that's a problem. But maybe the solution to that would be to set a default uncaught exception handler that detects `Error`s and metaphorically pulls the plug on the JVM.
Stephen C
(probably redundant, but adding to the above comment) Thread.currentThread().setUncaughtExceptionHandler(); Way more straightforward than AOP, but also less chances to catch the Exception.
brunodecarvalho
RonU
@RonU I mentioned that fact. AOP's chances to solve this are way higher than UEH but it's a complete overkill. I'd still go with it over having some other thread or whatever periodically doing stuff in the heap (trying to raise OOME to shutdown the JVM).
brunodecarvalho
@Rob - yes it is. But if you read all the comments, you will see that it was Michael Neale (the OP) who brought up the problem of OOME's getting lost.
Stephen C
A: 

How about catching OOME yourself in your code and System.exit()?

Adam Schmideg
A deeper catch would still catch it first, and the only difference with not catching is that now you'll quit the whole program instead of just the thread.
Bart van Heukelom
+1  A: 

One more thing I could think of (although I do not know how to implement it) would be to run your app in some kind of debugger. I noticed, that my debugger can stop the execution when an exception is thrown. :-)

So may be one could implement some kind of execution environment to achieve that.

DerMike
A: 

You can run your java program using Java Service Wrapper with an OutOfMemory Detection Filter. However, this assumes that the "bad people" are nice enough to log the error :)

dogbane
A: 

One possibility, which I would love to be talked out of, is have a stupid thread thats job is to do something on the heap. Should it receive OOME - then it exits the whole JVM.

Please tell me this isn't sensible.

Michael Neale
I'd focus on avoiding OOME at all; there's no elegant solution for this problem since it's a consequence of bad practices put to use. Either a) remove the try-catches (assumes access to source) b) use alternate libs c) report the problem and submit a patch d) write your own lib.
brunodecarvalho
Yes - I am working on the OOME of course - the current issue is solved. But its a general principle - I don't want my app running in a useless state due to erroneous error handling, or prolonging the death of the JVM.
Michael Neale
+1  A: 

If you don't care much how the system exits as long as it does exit, then passing a -XX:+HeapDumpOnOutOfMemoryError JVM switch in the startup script/command might work?

no - doesn't exit - but does dump
Michael Neale
A: 

You could use the MemoryPoolMXBean to be notified when a program exceeds a set heap allocation threshold.

I haven't used it myself but it should be possible to shut down this way when the remaining memory gets low by setting an allocation threshold and calling System.exit() when you receive the notification.

josefx
+2  A: 
  1. edit OutOfMemoryError.java, add System.exit() in its constructors.

  2. compile it. (interestingly javac doesn't care it's in package java.lang)

  3. add the class into JRE rt.jar

  4. now jvm will use this new class. (evil laughs)

This is a possibility you might want to be aware of. Whether it's a good idea, or even legal, is another question.

irreputable
That's legal, but it is not something you would **ever** want to do in production code.
Stephen C
Re - *"interestingly javac doesn't care it's in package java.lang"*. Well how else would you compile Java code in `java.lang`? Yes, there are security checks to stop a normal application JAR (etc) from replacing "java.lang.*" etc classes, but these have to be enforced by the class loader / security manager. (If you relied on the Java compiler to do the enforcement, it would be relatively simple to subvert.)
Stephen C
+2  A: 

Another approach is to use the flag

-XX:OnOutOfMemoryError="<cmd args>; <cmd args>"

Definition: Run user-defined commands when an OutOfMemoryError is first thrown. (Introduced in 1.4.2 update 12, 6)

See http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Michael Neale
Hmm ... that might work, depending on what you want the commands to do. Reporting errors would be fine, but `kill -9`-ing the JVM might have nasty side-effects.
Stephen C