views:

994

answers:

3

I'm using a java.util.concurrent.ExecutorService that I obtained by calling Executors.newSingleThreadExecutor(). This ExecutorService can sometimes stop processing tasks, even though it has not been shutdown and continues to accept new tasks without throwing exceptions. Eventually, it builds up enough of a queue that my app shuts down with OutOfMemoryError exceptions.

The docs seem to indicate that this single thread executor should survive task processing errors by firing up a new thread worker thread if necessary to replace one that has died. Am I missing something?

Thanks!

+3  A: 

It sounds like you have two different issues:

1) You're over-feeding the work queue. You can't just keep stuffing new tasks into the queue, with no regard for the consumption rate of the task executors. You need to figure out some logic for knowing when you to block new additions to the work queue.

2) Any uncaught exception in a task's thread can completely kill the thread. When that happens, the ExecutorService spins up a new thread to replace it. But that doesn't mean you can ignore whatever problem is causing the thread to die in the first place! Find those uncaught exceptions and catch them!

This is just a hunch (cuz there's not enough info in your post to know otherwise), but I don't think your problem is that the task executor stops processing tasks. My guess is that it just doesn't process tasks as fast as you're creating them. (And the fact that your tasks sometimes die prematurely is probably orthogonal to the problem.)

At least, that's been my experience working with thread pools and task executors.


Okay, here's another possibility that sounds feasible based on your comment (that everything will run smoothly for hours until suddenly coming to a crashing halt)...

You might have a rare deadlock between your task threads. Most of the time, you get lucky, and the deadlock doesn't manifest itself. But occasionally, two or more of your task threads get into a state where they're waiting for the release of a lock held by the other thread. At that point, no more task processing can take place, and your work queue will pile up until you get the OutOfMemoryError.

Here's how I'd diagnose that problem:

Eliminate ALL shared state between your task threads. At first, this might require each task thread making a defensive copy of all shared data structures it requires. Once you've done that, it should be completely impossible to experience a deadlock.

At this point, gradually reintroduced the shared data structures, one at a time (with appropriate synchronization). Re-run your application after each tiny modification to test for the deadlock. When you get that crashing situation again, take a close look at the access patterns for the shared resource and determine whether you really need to share it.

As for me, whenever I write code that processes parallel tasks with thread pools and executors, I always try to eliminate ALL shared state between those tasks. As far as the application is concerned, they may as well be completely autonomous applications. Hunting down deadlocks is a drag, and in my experience, the best way to eliminate deadlocks is for each thread to have its own local state rather than sharing any state with other task threads.

Good luck!

benjismith
Good points. I'm working the exception angle for now... I don't think its a rate problem, at least not at this point. It will run for hours at a time, with tasks getting executing quickly and steadily until it suddenly just stops.
rcalder816
It turns out to have been a rate problem after all. The application would occasionally encounter periods of activity high enough to build up enough backlog that the executor would never catch up, eventually consuming all available memory. Thanks for your help!
rcalder816
woohoo!! I love being right!!! ....and I'm glad you were able to find and fix the problem :o)
benjismith
A: 

Although you don't leave enough detail to be sure, the first thing I'd try is to have your tasks catch "Exception" at the top level and log the message.

I know it doesn't seem right, but occasionally (depending on a lot of variables) I've worked on code where stuff happening in a thread throws an exception and it is never logged, or it just doesn't show up on the console--yet the "executing" code exits out of it's top level loop or whatever code is causing your task to run.

I guess I'm just saying, make sure your tasks are not throwing an exception out.

Bill K
+3  A: 

My guess would be that your tasks are blocking indefinitely, rather than dying. Do you have evidence, such as a log statement at the end of your task, suggest that your tasks are successfully completing?

This could be a deadlock, or an interaction with some external process that is blocking.

erickson