It sounds like you have two different issues:
1) You're over-feeding the work queue. You can't just keep stuffing new tasks into the queue, with no regard for the consumption rate of the task executors. You need to figure out some logic for knowing when you to block new additions to the work queue.
2) Any uncaught exception in a task's thread can completely kill the thread. When that happens, the ExecutorService spins up a new thread to replace it. But that doesn't mean you can ignore whatever problem is causing the thread to die in the first place! Find those uncaught exceptions and catch them!
This is just a hunch (cuz there's not enough info in your post to know otherwise), but I don't think your problem is that the task executor stops processing tasks. My guess is that it just doesn't process tasks as fast as you're creating them. (And the fact that your tasks sometimes die prematurely is probably orthogonal to the problem.)
At least, that's been my experience working with thread pools and task executors.
Okay, here's another possibility that sounds feasible based on your comment (that everything will run smoothly for hours until suddenly coming to a crashing halt)...
You might have a rare deadlock between your task threads. Most of the time, you get lucky, and the deadlock doesn't manifest itself. But occasionally, two or more of your task threads get into a state where they're waiting for the release of a lock held by the other thread. At that point, no more task processing can take place, and your work queue will pile up until you get the OutOfMemoryError.
Here's how I'd diagnose that problem:
Eliminate ALL shared state between your task threads. At first, this might require each task thread making a defensive copy of all shared data structures it requires. Once you've done that, it should be completely impossible to experience a deadlock.
At this point, gradually reintroduced the shared data structures, one at a time (with appropriate synchronization). Re-run your application after each tiny modification to test for the deadlock. When you get that crashing situation again, take a close look at the access patterns for the shared resource and determine whether you really need to share it.
As for me, whenever I write code that processes parallel tasks with thread pools and executors, I always try to eliminate ALL shared state between those tasks. As far as the application is concerned, they may as well be completely autonomous applications. Hunting down deadlocks is a drag, and in my experience, the best way to eliminate deadlocks is for each thread to have its own local state rather than sharing any state with other task threads.
Good luck!