I am using an NLP library (Stanford NER) that throws OOM errors for rare input documents.
I plan to eventually isolate these documents and figure out what about them causes the errors, but this is hard to do (I'm running in Hadoop, so I just know the error occurs 17% through split 379/500 or something like that). As an interim solution, I'd like to be able to apply a CPU and memory limit to this particular call.
I'm not sure what the best way to do this would be. My first though is to create a fixed thread pool of one thread, and use the timed get() on Future. This would at least give me a wall clock limit which would likely help somewhat.
My question is whether there is any way to do better than this with a reasonable amount of effort.