views:

170

answers:

4

Lets say that I have a library which runs 24x7 on certain machines. Even if the code is rock solid, a hardware fault can sooner or later trigger an exception. I would like to have some sort of failsafe in position for events like this. One approach would be to write wrapper functions that encapsulate each api a:

returnCode=DEFAULT;
try
{
  returnCode=libraryAPI1();
 }
catch(...)
{
 returnCode=BAD;
}
return returnCode;

The caller of the library then restarts the whole thread, reinitializes the module if the returnCode is bad.

Things CAN go horribly wrong. E.g.

if the try block(or libraryAPI1()) had:

 func1();
 char *x=malloc(1000);
 func2();

if func2() throws an exception, x will never be freed. On a similar vein, file corruption is a possible outcome.

Could you please tell me what other things can possibly go wrong in this scenario?

+2  A: 

Do you have control over libraryAPI implementation ?

If it can fit into OO model, you need to design it using RAII pattern, which guarantees the destructor (who will release acquired resources) to be invoked on exception.

usage of resource-manage-helper such as smart pointer do help too

try
{
    someNormalFunction();
    cSmartPtr<BYTE> pBuf = malloc(1000);
    someExceptionThrowingFunction();    
}
catch(...)
{
    // Do logging and other necessary actions
    // but no cleaning required for <pBuf>
}
YeenFei
yes, I do have the source code. My question is not how to fix the example, but what other problems I might encounter. Your answer is helpful in case I choose to use this wrapper and would refactor.
Sridhar Iyer
As I understand, you are implementing a software "watchdog" in your system. While you can recover from most exceptions, there are cases where you cant pretend things never happened and continue running, namely stack corruption :)
YeenFei
You should **always** use RAII for resource management in C++. Guard resources within a stack-allocate object whose destructor performs the cleanup. Then you can pretty much remove all the try/catch clauses.
jalf
you will need certain fault-containment (try/catch) after RAII implementation.
YeenFei
+2  A: 

The problem with exeptions is - even if you do re-engineer with RAiI - its still easy to make code that becomes desynchronized:

void SomeClass::SomeMethod()
{
  this->stateA++;
  SomeOtherMethod();
  this->stateB++;
}

Now, the example might look artifical, but if you substitue stateA++ and stateB++ for operations that change the state of the class in some way, the expected outcome of this class is for the states to remain in sync. RAII might solve some of the problems associated with state when using exceptions, but all it does is provide a false sense of security - If SomeOtherMethod() throws an exception ALL the surrounding code needs to be analyzed to ensure that the post conditions (stateA.delta == stateB.delta) are met.

Chris Becke
RAII provide ONLY fail-safe handling for resources. It does not protect a operation/process from falling apart.
YeenFei
RAII techniques can be used to easily solve this problem, if you use a generous definition of *resource*.
Ben Voigt
+2  A: 

This code:

func1();
char *x=malloc(1000);
func2();

Is not C++ code. This is what people refer to as C with classes. It is a style of program that looks like C++ but does not match up to how C++ is used in real life. The reason is; good exception safe C++ code practically never requires the use of pointer (directly) in code as pointers are always contained inside a class specifically designed to manage their lifespan in an exception safe manor (Usually smart pointers or containers).

The C++ equivalent of that code is:

func1();
std::vector<char> x(1000);
func2();
Martin York
+3  A: 

A hardware failure may not lead to a C++ exception. On some systems, hardware exceptions are a completely different mechanism than C++ exceptions. On others, C++ exceptions are built on top of the hardware exception mechanism. So this isn't really a general design question.

If you want to be able to recover, you need to be transactional--each state change needs to run to completion or be backed out completely. RAII is one part of that. As Chris Becke points out in another answer, there's more to state than resource acquisition.

There's a copy-modify-swap idiom that's used a lot for transactions, but that might be way too heavy if you're trying to adapt working code to handle this one-in-a-million case.

If you truly need robustness, then isolate the code into a process. If a hardware fault kills the process, you can have a watchdog restart it. The OS will reclaim the lost resources. Your code would only need to worry about being transactional with persistent state, like stuff saved to files.

Adrian McCarthy