Has anyone created or seen a good fault diagnostic procedure for a web based solutions that an Operations team could use to do diagnostic and support with?
The solution is based on a C# system running on IIS and making use of things like workflow and WCF services. It's a Service Based solution and also makes use of external and Internal (Client authored) services.
The idea is to provide them with a Fault Diagnostic Tree (Something like a flow chart) which is then accompanied by a textual bit that give them more information and/or a Checklist to run through.
The entire system is absolutely mission critical so any faults must be resolved ASAP.
Are there any standardised methodologies for creating something like this? A Google response with allot from the Automotive Industry, but all the Software related stuff requires me to subscribe to some arbitrary site or buy something I have never seen before.
Any help would be much appreciated!
<Update> Just to be clear... I need a guide for what to do in the event of a fault outside the application programming itself. There will always be a point where the customer would be requested to phone us if some strange exception occurs, but what process should they go through before doing that?
How detailed should this be? What is the starting point? (Right when the call comes in or after the ops person has made sure that it is an actual fault that requires immediate attention?) Should I assume that the Ops guide will be used by people with very little technical ability? What about authority? Can I assume they have access to restart services/databases and/or IIS or should I not care and let them build an escalation process around this document?</Update>