views:

80

answers:

1

I'm writing some client/server software and I'm facing the following design issue. Normally, I use a VERIFY macro very liberally - if something is wrong in an user's machine, I want the software to fail and log the error so it can be fixed. I was never a fan of ignoring any kind of errors.

However, I'm now writing a server. If the server dies, many clients go down, so the server should die as little as possible. Therefore, I don't know how to treat some conditions that I'd treat as fatal exceptions otherwise.

For example, I get a network packet from an user who isn't logged in. Even though it shouldn't happen, I have enough experience to know "impossible" errors do happen from time to time. So I'm pretty sure if I do a fatal error on these cases, the server WILL crash eventually. On the other hand, I could log and ignore the error and continue, but I'm afraid some bugs may go undetected this way.

What would you do in a situation like this one?

+3  A: 

If you can recover from the error, than obviously it wasn't fatal. I can't see the benefit of failing if you can log the error and continue execution - the most important thing is that you've captured the error on log. If you can recover and continue to operate as normal, than that is the best course.

You should implement in addition a notification system (server monitoring) that depending on the error level would notify you in varying degrees of urgency so you'd pick up as soon as possible on something time critical. There are generic system like that for servers, such as Nagios and Munin. You should have look at what they do and see if you can take something from them and implement / integrate it into your system.

Regardless, you should try to make sure client instances are as sandboxed as possible. A client thread going down shouldn't take down the entire server - ever (at least in theory).

Eran Galperin