views:

514

answers:

4

Background:

Some time ago, I build a system for recording and categorizing application crashes for one of our internal programs. At the time, I used a combination of frequency and aggregated lost time (the time between the program launch and the crash) for prioritizing types of crashes. It worked reasonably well.

Now, The Powers That Be want solid numbers on the cost of each type of crash being worked on. Or at least, numbers that look solid. I suppose I could use the aggregate lost time, multiplied by some plausible figure, but it seems dodgy.

Question:

Are there any established methods of calculating the real-world cost of application crashes? Or failing that, published studies speculating on such costs?


Consensus

Accuracy is impossible, but an estimate based on uptime should suffice if it is applied consistently and its limitations clearly documented. Thanks, Matt, Orion, for taking time to answer this.

+7  A: 

I've not seen any studies, but a reasonable heuristic would be something like :

( Time since last application save when crash occurred + Time to restart application ) * Average hourly rate of application operator.

The estimation gets more complex if the crashes have some impact on external customers such, or might delay other things (i.e. create a bottle neck such that another person winds up sitting around waiting because some else's application crashed).

That said, your 'powers that be' may well be happy with a very rough estimate so long as it's applied consistently and they can see how it is changing over time.

Matt Sheppard
+6  A: 

The Powers That Be want solid numbers on the cost of each type of crash being worked on

I want to fly in my hot air balloon to Mars, but it doesn't mean that such a thing is possible.

Seriously, I think you have a duty to tell them that there is no way to accurately measure this. Tell them you can rank the crashes, or whatever it is that you can actually do with your data, but that's all you've got.

Something like "We can't actually work out how much it costs. We DO have this data about how long things are running for, and so on, but the only way to attach costs is to pretend that X minutes equals X dollars even though this has no basis in reality"

If you just make some bullcrap costing algorithm and DON'T push back at all, you only have yourself to blame when management turns around and uses this arbitrary made up number to do something stupid like fire staff, or decide not to fix any crashes and instead focus on leveraging their synergy with sharepoint portal internet web sharing love server 2013

Update: To clarify, I'm not saying you should only rely on stats with 100% accuracy, and just give up on everything else.
What I think is important is that you know what it is you're measuring. You're not actually measuring cost, you're measuring uptime. As such, you should be upfront about it. If you want to estimate the cost that's fine, but I believe you need to make this clear..

If I were to produce such a report, I'd call it the 'crash uptime report' and maybe have a secondary field called "Estimated cost based on $5/minute." The managers get their cost estimate, but it's clear that the actual report is based on the uptime, and cost is only an estimate, and how the estimate works.

Orion Edwards
+4  A: 

There is a missing factor here .. most applications have a 'buckling' factor where crashes suddenly start "costing" a lot more because people loose confidence in the service your app is providing. Once that happens then it can be very costly to get users back to trusting and using the system.

Christian
A: 

It depends...

In terms of cost, the only thing that matters is the business impact of the crash, so it rather depends on the type of application.

For may applications, it may not be possible to determine business impact. For others, there may be meaninful measures.

Demand-based measures may be meaningful - if sales are steady then down-time for a sales app may be useful. If sales fluctuate unpredictable, then such measures are less useful.

Cost of repair may also be useful.

Kramii