views:

85

answers:

4

I'm working on a library where I'm farming various tasks out to some third-party libraries that do some relatively sketchy or dangerous platform-specific work. (In specific, I'm writing a mathematical function parser that calls JIT-compilers, like LLVM or libjit, to build machine code.) In practice, these third-party libraries have a tendency to be crashy (part of this is my fault, of course, but I still want some insurance).

I'd like, then, to be able to very gracefully deal with a job dying horribly -- SIGSEGV, SIGILL, etc. -- without bringing down the rest of my code (or the code of the users calling my library functions). To be clear, I don't care if that particular job can continue (I'm not going to try to repair a crash condition), nor do I really care about the state of the objects after such a crash (I'll discard them immediately if there's a crash). I just want to be able to detect that a crash has occurred, stop the crash from taking out the entire process, stop calling whatever's crashing, and resume execution.

(For a little more context, the code at the moment is a for loop, testing each of the available JIT-compilers. Some of these compilers might crash. If they do, I just want to execute continue; and get on with testing another compiler.)

Currently, I've got a signal()-based implementation that fails pretty horribly; of course, it's undefined behavior to longjmp() out of a signal handler, and signal handlers are pretty much expected to end with exit() or terminate(). Just throwing the code in another thread doesn't help by itself, at least the way I've tested it so far. I also can't hack out a way to make this work using C++ exceptions.

So, what's the best way to insulate a particular set of instructions / thread / job from crashes?

+9  A: 

Spawn a new process.

alex
This is the only way to do it. A thread can corrupt memory anywhere in the process, so after a SEGV, you can't guarantee that your memory is unaffected.
KeithB
Thanks for the heads-up. Almost certainly the right answer here. I'm off to read up on fork() and company.
Charles Pence
+5  A: 

What output do you collect when a job succeeds?

I ask because if the output is low bandwidth I would be tempted to run each job in its own process.

Each of these crashy jobs you fire up has a high chance of corrupting memory used elsewhere in your process.

Processes offer the best protection.

morechilli
+1  A: 

Processes offer the best protection, but it's possible you can't do that.

If your threads' entry points are functions you wrote, (for example, ThreadProc in the Windows world), then you can wrap them in try{...}catch(...) blocks. If you want to communicate that an exception has occurred, then you can communicate specific error codes back to the main thread or use some other mechanism. If you want to log not only that an exception has occured but what that exception was, then you'll need to catch specific exception types and extract diagnostic information from them to communicate back to the main thread. A'la:

int my_tempermental_thread()
{
  try
  {
    // ... magic happens ...
    return 0;
  }
  catch( const std::exception& ex )
  {
    // ... or maybe it doesn't ...
    string reason = ex.what();
    tell_main_thread_what_went_wong(reason);
    return 1;
  }
  catch( ... )
  {
    // ... definitely not magical happenings here ...
    tell_main_thread_what_went_wrong("uh, something bad and undefined");
    return 2;
  }
}

Be aware that if you go this way you run the risk of hosing the host process when the exceptions do occur. You say you're not trying to correct the problem, but how do you know the malignant thread didn't eat your stack for example? Catch-and-ignore is a great way to create horribly confounding bugs.

John Dibling
A: 

On Windows, you might be able to use VirtualProtect(YourMemory, PAGE_READONLY) when calling the untrusted code. Any attempt to modify this memory would cause a Structured Exception. You can safely catch this and continue execution. However, memory allocated by that library will of course leak, as will other resources. The Linux equivalent is mprotect(YorMemory, PROT_READ), which causes a SEGV.

MSalters