views:

255

answers:

8

Lately, I've found myself in quite a few arguments with my boss about the handling of exceptions within our web app (a c# asp.net MVC application).

Basically the conversations go something like this:

Boss: "There is something wrong with our program, client x's database went down today and everyone is seeing the error page."

Me: "Mostly every page in the application uses the database for something (except the error page), there is no reasonable alternative other than to show the error page."

Boss: "Our application should be more resilient -- the part of the application that don't require database access should still function."

Often, the cases are as extreme as this, but sometimes we run into a case where we are integrating with another service where we can still safely show other portions of the page, or complete the operation, albeit with some annoying code as later portions of code need to later use the results of the operation which may have failed. If there are many points of possible failure this can turn into some extremely unmanageable code.

In general, for a "normal" web application (not mission-critical, etc...) how much time do "good" developers spend trying to make their code resilient enough to handle these kind of situations. My boss seems to think that the code should be able handle almost any situation (can't you just catch an exception?). I don't see how this can be economical when there are many possible points of failure.

+11  A: 

I'd leave it to the boss to decide. Tell him an estimate in hours how long it would take to make the app "resilient" and he'll decide if it's worth the investment.

Vitalik
This is, in fact, the approach I usually take. It is usually quite effective though I sometimes perceive the sentiment that my estimates are overblown and it should be a matter of "just catching an exception" which it rarely is.
Krazzy
+2  A: 

It depends... many web pages display and render individual "widgets". Each widget may draw from a different data source... some may even be completely static. I agree with your boss from the standpoint that if loading one widget fails for whatever reason, we should still try to load the rest of the page. One failed widget should not keep the user from interacting with the rest of the site.

As an example from StackOverflow, lets say that the "Related" section to the right of these answers fails to load, or that the footer fails to load for some reason. I'd say it's preferable to just display some error message in those portions of the page and still load your question and the answers, since these should be unaffected.

Now, there are situations when the code which loads all those "widgets" fails... or some data source used by your framework fails. In these cases, I think it is reasonable to display a generic error page... If the framework or "widget" loading code fails for whatever reason, this is indeed an exceptional case that probably can't be handled.

In sum, I mostly agree with your boss and say that as many error cases as possible should be considered and handled... and that we should be displaying whatever possible to the user.

Polaris878
+3  A: 

I guess I'd be a little worried that you're losing databases with a frequency that causes this question to arise. Even in non-mission critical apps, if you're losing a DB more than once a week, I'd be seeing what I could do about improving that before I worried about crafting ways out of the issue on the user end of the application.

That being said, my company's best practices include coding so that something such as a DB error will failover gracefully on the user end to a "cannot connect" message in the output rather than a full blown 404 type error. I've found that it really doesn't add more than a few minutes to the coding process, and the value of not angering the user is well worth the "cost"

bpeterson76
Perhaps the database example wasn't a good one as it is a little extreme. It is usually a matter of integrating with third party web services and the like.
Krazzy
+7  A: 

Most apps that I've worked on that are heavily data driven have redirected to a custom error page when the main database (ie the one that powers 99% of the pages) is down. (you do want to be showing them a custom screen though, not just letting them get the server error page)

For external services like hitting an SMTP server or a database that isn't used by much of the rest of the app we'll usually have code around it that just displays on-page feedback if the service/database is down/inaccessible.

Its really up the client/stakeholder though, just determine what they want to have happen when the database is down and do it for them. It'll take time but it shouldn't lead to an unmaintainable app or any other coding horror.

kekekela
+3  A: 

Your boss should have discussed with the customer, before creating the application. Terms like "non-critical" should be defined as a number (percentage of uptime). Some applications require different parts of the application to have different numbers of uptime (what your boss suggests) and applications are up/down as a whole (how it works now). The "resilient" application will probably be written in a different way (distributed reads/async writes, etc.) than the "normal" application, so it will (probably) be hard to convert the "normal" application.

A good boss discusses a good SLA with the customer of the application, and tells it to the developers before the development start. On the other side, a good developer should complain by his boss, about incomplete requirements before starting development. When nothing is mentioned about availability in the requirements, the requirements are incomplete.

When the availability or scalability requirements of an existing application change significantly, it might be very hard to make the changes in an application profitable. When you have a good boss, those requirements will only change significantly when the application has a lot more success then initially estimated (like 100 times more users). In that case the enormous success will generate enough money to make the it profitable to rewrite huge parts of the application.

Paco
+2  A: 

This is probably a more academic answer given that you've inherited this code. If you use, or rather if someone had used, the Microsoft.Practices.EnterpriseLibrary.ExceptionHandling ExceptionPolicy pattern then it is really easy to switch from showing an error page (by throwing exception) to eating exceptions and displaying empty grids, lists, etc.

You may already be aware of this little pattern, but here it is anyway:

        try
        {
            //get data
        }
        catch (Exception ex)
        {
            if (ExceptionPolicy.HandleException(ex, "Data Access Exception"))
                throw ex;
        }
matt-dot-net
A: 

The answer is that the database shouldn't go down. Your time and effort would be better spent fixing this rather than to create an app that degrades gracefully when the database goes down.

However, since your boss knows everything, you kind of have to do what they say unless you can convince them as to why what they're saying would be a waste of resources.

As for "can't you just catch all exceptions" - that shows a misunderstanding about how exceptions are supposed to be used. It is a good thing(tm) if your app throws an uncaught exception when there is something that went critically wrong, such that it should never ever happen when the server is running as it's supposed to. For example, if the server runs out of memory, or is missing some library or extension it requires, or the database has become corrupt or is down. If, on the other hand, your app throws an uncaught exception when the user types a phone number into the email address field, this is bad, and that should have been caught (or not thrown).

The guts of an uncaught exception (such as the error message and perhaps a backtrace) should never be output to the client on a public server. They should be logged, but the user should instead see a more friendly error message indicating a temporary problem with the site and to try again later.

thomasrutter
+1  A: 

Probably you're just showing an generic error page when an empty set of data with and a "user-friendly" error message would do.

So, for instance, if you're displaying a list of users/messages/date but the service where you collect it, is down, you could probably show an empty set.

So instead of showing:

500 Server Error




.

You could show something like:

 User   |    Message     | Date 
 ------------------------------
    No data available* 




* Xyz service is be down. 

Your app still won't work, because the data is not available, but instead of throwing that to the end-user face, you could put a placeholder with no data.

This varies a lot depending on what you use, but in general terms it could be as simple as:

 List<Data> data = EmptyList<Data>();

 try 
 {
     data = service.GetData();
 } catch( ServiceUnavailableException error ) 
 {
     errorPage.SetMessage( service.GetName() + " service is down ");
     // log the error message
    logger.doLog( error );
 } 

That is, initialize your list or whatever structure to something empty, then fill with with the service, and if it fails ( then the list will remain empty ) add an error message ( user friendly please ) and log the exception.

OscarRyz