views:

160

answers:

7

What is the best practice when dealing with Errors within a server application?

In particular, how do you think an application should handle errors like OutOfMemoryError?

I'm particularly interested in Java applications running within Tomcat, but I think that is a more general problem.


The reason I'm asking is because I am reviewing a web application that frequently throws OOME, but usually it just logs them and then proceeds with execution. That results, obviously, in more OOMEs.

While that is certainly bad practice, in my opinion, I'm not entirely sure that stopping the Server would be the best solution.

+2  A: 

There is not much you can do to fix OutOfMemoryError except to clean up the code and adjust JVM memory (but if you have a leak somewhere it's just a bandaid)

If you don't have access to the source code and/or are not willing to fix it, an external solution is to use some sort of watch dog program that will monitor java application and restart it automatically when it detects OOMEs. Here is a link to one such program.

Of course it assumes that the application will survive restarts.

Gregory Mostizky
+1  A: 

Well, if you have OOME then the best way would be to release as many resources (especially cached ones) as possible. Restarting the web-app (in case it's web-apps fault) or the web server itself (in case something else in the server does this) would do for recovering from this state. On the development front though it'd be good to profile the app and see what is taking up the space, sometimes there are resources that are attached to a class variable and hence not collected, sometimes something else. In the past we had problems where Tomcat wouldn't release the classes of previous versions of the same app when you replace the app with a newer version. Somewhat solved the problem by nullifying class variables or re-factoring not to use them at all but some leaks still remained.

inkredibl
A: 

I'm not an expert in such things, but I'll take a chance to give my vague opinion on this problem.

Generally, I think that there's two main ways:

  1. Server is stopped.
  2. Resources are thus gracefully degrading throughput, reducing memory consumption, but staying alive. For this case application must have appropriate architecture, I think.
Rorick
A: 

According to the javadoc about a java.lang.Error:

An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch. Most such errors are abnormal conditions. The ThreadDeath error, though a "normal" condition, is also a subclass of Error because most applications should not try to catch it.

A method is not required to declare in its throws clause any subclasses of Error that might be thrown during the execution of the method but not caught, since these errors are abnormal conditions that should never occur.

So, the best practice when dealing with subclasses of Error is to fix the problem that is causing them, not to "handle" them. As it's clearly stated, they should never occur.

In the case of an OutOfMemoryError, maybe you have a process that consumes lots of memory (e.g. generating reports) and your JVM is not well sized, maybe you have a memory leak somewhere in your application, etc. Whatever it is, find the problem and fix it, don't handle it.

Pascal Thivent
Correct in theory, and very wrong in practice. In a server app where every minute of downtime costs real money, you do everything you possibly can to handle *every* error - because fixing it is not something you can do immediately (but should of course do as soon as possible, so handling the error should include sending out some urgent notifications), but lost opportunity costs *do* begin immediately.
Michael Borgwardt
Of course sending notification is a good thing to do but this shouldn't be implemented in the application, it's a supervision/monitoring issue. Then, if your need high availability (i.e. no downtime), run a cluster, not a single app server instance.Finally, the OP is asking for a best practice and I think that the best practice is to fix the problem, not to handle it or to send a notification about it.
Pascal Thivent
I'd say the best practice is to do both as far as possible. Running a cluster increases cost and complexity disproportionally.
Michael Borgwardt
Come on, in a world "where every minute of downtime costs real money", use a fault tolerant architecture. If fault tolerance isn't necessary, then downtime doesn't cost that much.
Pascal Thivent
+2  A: 

The application shouldn't handle OOM at all - that should be the server's responsibility.

Next step: Check if memory settings are appropriate. If they aren't, fix them; if they are, fix the application. :)

gustafc
What do you mean by "the server"? The hardware or OS can't really do anything, but an application server might. If the application does not run inside an application server, it has to handle the error itself.
Michael Borgwardt
@Michael, I meant the app server (which is Tomcat, to judge from the OP).
gustafc
A: 

An OutOfMemoryError is by no means always unrecoverable - it may well be the result of a single bad request, and depending on the app's structure it may just abandon processing the request and continue processing others without any problems.

So if your architecture supports it, catch the Error at a point where you have a chance to stop doing what caused it and continue doing something else - for an app server, this would be at the point that dispatches requests to individual app instances.

Of course, you should also make sure that this does not go unnoticed and a real fix can be implemented as soon as possible, so the app should log the error AND send out some sort of warning (e.g. email, but preferably something harder to ignore or get lost). If something goes wrong during that, then shutting down is the only sensible thing left to do.

Michael Borgwardt
+1  A: 

@Michael Borgwardt, You can't recover from an OutOfMemoryError in Java. For other errors, it might not stop the application, but OutOfMemoryError literally hangs applications.