Let's suppose there is a C# program, which is used as a windows service. Let's suppose that the service has gone wild and is consuming CPU and memory like mad. It needs to be restarted very soon, because it's a production system. So I don't have much time to gather run-time information. Maybe a quick look on the task manager ... that's all.
After that, all I have are log4net log files and the windows event log for post mortem analysis.
Suppose that I have found out the reason for the problem. Someone else fixes it and maybe the programmer adds some additional logging, so I can find a similar problem faster next time. Nevertheless: I still depend on the quality of the log files and hope that next time a problem will somehow reveil itself in the loggings.
Are there also other ways to do post mortem analysis? Maybe something like thread dumps (like in java), memory dumps or something else, which may aid in post mortem analysis? Maybe some build-in .NET framework tool can help?
I am very interested in real project experiences and how you would try to tackle this maintenance question, which I think is very real for most programmers.