What information do you capture when your software crashes in the field?

views:

156

answers:

+8 Q:

What information do you capture when your software crashes in the field?

I am working on rewriting my unexpected error handling process, and I would like to ask the community:

What information do you capture both automatic, and manually, when software you have written crashes?

Right now, I capture a few items, some of which are:

Automatic:

Name of app that crashed
Version of app that crashed
Stack trace
Operating System version
RAM used by the application
Number of processors
Screen shot: (Only on non-public applications)
User name and contact information (from Active Directory)

Manual:

What context is the user in (i.e.: what company, tech support call number, RA number, etc...)
When did the user expect to happen? (Typical response: "Not to crash”)
Steps to reproduce.

What other bits of information do you capture that helps you discover the true cause of an applications problem, especially given that most users simply mash the keyboard when asked to tell you what happened.

For the record I’m using C#, WPF and .NET version 4, but I don’t necessarily want to limit myself to those.

LA Transtar also keeps a key log that is saved only for failures. This log contains the input and a trace of the program as it is proceeding. The log is reset at the start of each new transaction.

Dave 2010-05-13 20:34:37

(This is somewhat Windows / .NET specific, but that is what you specified in the question, and I think this is quite useful info in that context.)

Unless your application is strictly single-threaded, you want a dump file (which will give you the stack for all threads, at a minimum), not just a stack trace for the thread throwing the exception.

Generating a dump that is not too big and that has enough information to give you useful managed stack traces is a bit tricky, but there is a very useful utility called clrdump that will handle some of the gorier details for you.

Clrdump is mostly a wrapper for Microsoft's DbgHelp.dll. You can use DbgHelp directly - see this question - but then you will get a "full minidump" which will be as big as the virtual address space of your application, which can be pretty large. Clrdump does a nice job of creating a small dump with just the stack traces plus enough info for SOS to be able to read them.

McKenzieG1 2010-05-14 18:07:52

You do not mention about process logging (like syslog in Linux, Event Viewer for windows?). Since I also have a sys admin background I truly appreciate programs with a logging facility. Even better if the verbosity level can be selected.

It is good for you to know more about the environment, and it is good for your users if they have to do some type of integration work with other tools.

If your users are more technical, you can ask them to set the logging verbosity to the maximum and reproduce the error again.

Francisco Garcia 2010-05-15 14:06:39

Basically, there is no Golden rule that you have to follow and implement it in every application. Depending on your business application and scenario, different things are the most appropriate to be included for information collection when an error occurs.

The ones you mentioned are OK, but here's a bit more that is good to be logged :

input parameters for critical and complex operations
context of your program - some objects with heavy algorithms - the most risk-possessing classes
the state in which is your program

example : the flow of your program is like a state automata and you have 5 states and you have reached state 3.

if you have an application which is server-client , collect both logs - from provider and consumption side
memory dump is not generally a good suggestion - do it only when you need to understand problems in frameworks or JVM(for example) that you have no control of. OutOfMemoryError for example

Leni Kirilov 2010-05-15 17:04:41

I don't see in your list the most important information (when we talking about dotnet/java level of code).
The exception type,message and trace.
You can use simple code, to catch any exception, and "write to log"/"send directly to the email".

Avram 2010-05-15 22:37:56

+1 A:

And now from the paranoia camp :(

Consider what industry the software targets. Gathering any information about the user (even active directory name) or the network can get your app blackballed and potentially carries liability. i.e. What if your bug database is compromised and that information is used to break into a bank's or a government laboratories network. Will the bug report containing their IP's be noticed? Can you be sued? Maybe...

For instance, if you need to gather network specific data to diagnose network issues, consider having your app replace any system names or IP's with placeholders before data gets send back to you. (emailSrvr1, bankAcctNumSrv, becomes srvr1 and srvr2) It's a bigger pain in tracking down issues, but may be worth it. This still captures info that may get you in trouble, but may help.

I've been working with high end enterprise and Government for a few years which colors my perspective, but it's probably worth considering what you are collecting and how it is being stored.

Oldmicah 2010-05-16 12:06:24

ansaurus

tags:

views:

answers:

What information do you capture when your software crashes in the field?

related questions