views:

219

answers:

5

Let's suppose there is a C# program, which is used as a windows service. Let's suppose that the service has gone wild and is consuming CPU and memory like mad. It needs to be restarted very soon, because it's a production system. So I don't have much time to gather run-time information. Maybe a quick look on the task manager ... that's all.

After that, all I have are log4net log files and the windows event log for post mortem analysis.

Suppose that I have found out the reason for the problem. Someone else fixes it and maybe the programmer adds some additional logging, so I can find a similar problem faster next time. Nevertheless: I still depend on the quality of the log files and hope that next time a problem will somehow reveil itself in the loggings.

Are there also other ways to do post mortem analysis? Maybe something like thread dumps (like in java), memory dumps or something else, which may aid in post mortem analysis? Maybe some build-in .NET framework tool can help?

I am very interested in real project experiences and how you would try to tackle this maintenance question, which I think is very real for most programmers.

+2  A: 

You can do crash dumps with .NET, and look at them with windbg / sos (and sosassist). Not simple, but it works. But fairly hardcore. Searches on "+windbg +.NET" should prove interesting.

Other than that - resource counters? log files? Lots of things you might look at that can be enabled fairly easily.

Marc Gravell
+1  A: 

A great resource for post-mortem analysis with WinDbg and SOS is Tess Ferrandez' series of blog entries on the subject.

EDIT: Link updated

dpp
I think the right link is: http://blogs.msdn.com/tess/pages/net-debugging-demos-information-and-setup-instructions.aspx ?
Theo Lenndorff
+2  A: 

As Marc says WinDbg + SoS will let you debug a lot of problems, you can't really address in Visual Studio. There are some excellent tutorials this blog.

For memory issues you can also look at the .NET Performance counters in Perfmon. You could look at where objects are located (which generation) and how much time is spend in garbage collection. That should give you some useful information. If you want to know why object are not being collected WinDbg and SoS is the way to go. To walk you through a simple session the steps are:

  1. Inspect the heap using !dumpheap -stat, look for large number of instances. You probably have some idea of what you would expect to find on the heap at any given moment, so if anything looks out of the ordinary, look into that.

  2. Pick random instance and do a !gcroot on the address of the instance. That will tell you why the object is not being collected.

  3. Repeat

Likely candidates for keeping stuff alive longer than it should are: events, statics and the finalizer queue to name a few.

You may also want to take a look at my answer for this question to see more WinDbg stuff.

Brian Rasmussen
A: 

If the process is still live then you could run the Managed Stack Explorer against it to get a quick snapshot of what it is doing. You can run this without an explicit install.

Other than that then a full dump + windbg + SOS gives you the most information, but getting at it isn't trivial.

Rob Walker
+1  A: 

Unfortunately I've had to do a fair amount of this - the best tool I have come across is cordbg which comes with the sdk (you'll need the correct version for your .net version). http://msdn.microsoft.com/en-us/library/a6zb7c8d.aspx for details.

Attach to the running process in cordbg (a <[pid]>) , attach to each running thread ( t <[tid]>) then dump the stack for each thread ( w ).

Automating this task with a little vb script and then dumping to a file will allow you to run this tool a number of times, capturing the output to a file. Comparing all thread stacks will give you a very good idea as to where your application is spending it's time.

The nice thing about this approach, especially with automating the dumps, is that you can very quickly grab all information and get your process restarted in the shortest amount of time.

headsling