ansaurus

Question

Diagnosing runaway CPU in a .Net production application

Answer 1

+3 A:

A profiler is probably the correct answer here.

If you don't want a "fully fledged profiler" like DotTrace, you might want to try SlimTune. It works fairly well, and is completely free (and open source).

Reed Copsey 2009-11-10 23:58:07

SlimTune is a sampling profiler, not a tracing profiler. It doesn't kill the performance of the half-dead application - it's perfect for what you're after.

Reed Copsey 2009-11-11 00:05:21

Can not use slimtune, my app is launched as a side affect (its a media center plugin) and it appears slimtune has no attach option. Also Im worries about how I would explain that installer to my customers, it installs a ton of stuff.

Sam Saffron 2009-11-11 00:11:45

SlimTune is based off ICorProfiler which does not support attaching, you only get that with ICorDebug

Sam Saffron 2009-11-11 00:13:50

Well, if it's a side effect, you may want to just setup a remote session so you can work on their system, and use MDbg from a command line, then.

Reed Copsey 2009-11-11 00:18:15

Yeah but sending them an exe to run would be soo much easier, also I want this solution to be scalable and reusable, there are only that many instances of "sam" out there.

Sam Saffron 2009-11-11 00:20:14

@Reed see my answer

Sam Saffron 2009-11-11 08:18:27

Answer 2

+1 A:

I've had luck with the Red Gate Ants profiler. However it does require installation. I'm pretty sure they don't have a remote option.

Brian 2009-11-11 00:17:07

Answer 3

A:

I know that you specifically said that you didn't want to take complicate dumps and use WinDbg + Sos to analyze those.

However, that may not be necessary. I would suggest using WinDbg anyway, but instead of using dumps, just attach to the process when you see the runaway thread(s). Then all you need to do is run the !runaway command. That will give you the total running time for each thread. The runaway threads will be at the top of the list. Now all you have to do is run the !clrstack for the top thread (or threads as it may be).

E.g. if thread 4 is your primary suspect, do a ~4e!clrstack to get the managed stack for that thread. That should tell you what the runaway thread is doing.

I'll agree that WinDbg is not the easiest tool to use for many things, but this may actually turn out to be pretty simple, so I hope that you'll forgive me for posting something you didn't really what.

If WinDbg is still out of the question, feel free to comment.

Brian Rasmussen 2009-11-11 00:18:37

Completely appreciate that this can be achieved with Windbg, the trouble is that getting remote access to a home user running my software is kind of tricky to say the least. (they are generally behind firewalls with very limited time windows - its an htpc and the wife gets really mad when it stops working). So I really am looking for a very low impact 1-click solution. Perhaps no such tool exists and its time to get my hands dirty and write it, surely others need something like this.

Sam Saffron 2009-11-11 00:23:53

Okay. That does make it a bit harder. If dumps are not out of the question and the user happens to run Vista or Windows 7, they can just go to the task manager, right click on the process and select create memory dump. However, dumps can get rather large, so given your setup it may be difficult to move the dump file from the user's machine to your machine, but if it can be done, you can follow a similar approach with the dump.

Brian Rasmussen 2009-11-11 00:40:55

@Brian see my answer

Sam Saffron 2009-11-11 08:17:55

@Sam. That's cool!

Brian Rasmussen 2009-11-11 08:47:54

Mike Stall rules, its his fault I wrote it :p

Sam Saffron 2009-11-11 10:14:12

Answer 4

+2 A:

It does sound like you need a real profiler, but I thought I'd just throw this out there: PerfMon. It comes with windows, you can setup a perfmon profile that you can send to the user, they can capture and send you the log.

Here's a couple links I've kept around every time I need a perfmon refresher: TechNet magazine from 2008 and a post from the Advanced .NET Debugging blog.

Yoopergeek 2009-11-11 00:55:12

Perfmon is cool and all, but there is no way it can give you managed stack traces.

Sam Saffron 2009-11-11 10:15:13

True...thus my comment on probably needing a real profiler...but I figured it was at least worth mentioning...If you end up writing your own tool to do what you need, please be sure to share it.

Yoopergeek 2009-11-11 14:34:14

@Yoopergeek, I wrote one, see my answer

Sam Saffron 2009-11-11 19:11:16

Answer 5

A:

If you have managed code in lieu of a profiler worth using I've found that throwing a log message into your code is damn useful for spotting infinite loops and general multi thread progressions.

i.e

step 1 msg
step 2 msg

thread now 100% and no step 3 msg = bug.

Spence 2009-11-11 01:03:12

Answer 6

A:

I think you should look at memory and disk usage as well. If a machine runs out of memory and needs to start using virtual memory (on the disk drive), you'll see a spike in CPU and disk activity. In such conditions what looks like a CPU bottleneck is actually a memory bottleneck.

Frank Schwieterman 2009-11-11 01:05:28

Answer 7

+7 A:

The basic solution

Grab managed stack traces of each managed thread.
Grab basic thread statistics for each managed thread (user mode and kernel time)
Wait a bit
Repeat (1-3)
Analyze the results and find the threads consuming the largest amount of cpu usage, present the stack traces of those threads to the user.

Managed Vs. Unmanged Stack Traces

There is a big difference between managed and unmanged stack traces. Managed stack traces contain information about actual .Net calls whereas unmanaged ones contain a list of unmanaged function pointers. Since .Net is jitted the addressed of the unmanaged function pointers are of little use when diagnosing a problem with managed applications.

managed stack not that useful

How do you get an unmanaged stack trace for an arbitrary .Net process?

There are two ways you could get managed stack traces for an managed application.

Use CLR profiling (aka. ICorProfiler API)
Use CLR Debugging (aka. ICorDebug API)

What is better in production?

The CLR Debugging APIs have a very important advantage over the profiling ones, they allow you to attach to a running process. This can be critical when diagnosing performance issues in production. Quite often runaway CPU pops up after days of application use due to some unexpected branch of code executing. At that point of time restarting the app (in order to profile it) is not an option.

cpu-analyzer.exe

So, I wrote a little tool that has no-installer and performs the basic solution above using ICorDebug. Its based off the mdbg source which is all merged into a single exe.

It takes a configurable (default is 10) number of stack traces for all managed threads, at a configurable interval (default is 1000ms).

Here is a sample output:

C:\>cpu-analyzer.exe evilapp
------------------------------------
4948
Kernel Time: 0 User Time: 89856576
EvilApp.Program.MisterEvil
EvilApp.Program.b__0
System.Threading.ExecutionContext.Run
System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal
System.Threading._ThreadPoolWaitCallback.PerformWaitCallback

... more data omitted ...

Feel free to give the tool a shot. It can be downloaded from my blog.

EDIT

Here is a thread showing how I use cpu-analyzer to diagnose such an issue in a production app.

Sam Saffron 2009-11-11 08:12:15

You're pretty quick, but check out my answer. It's based on the idea that, suppose something is making your program take 100 times longer than necessary. If you just take a random-time snapshot of the call stack, you're gonna see it. Even if it only makes the program take 10% longer than it should, if you take 20 shots, you will see it on 2 shots, on average.

Mike Dunlavey 2009-11-11 15:12:27

... I bet you could put that logic into your solution, if you haven't already.

Mike Dunlavey 2009-11-11 15:15:23

@Mike that is essentially what I did, I edited my answer to provide a full explanation.

Sam Saffron 2009-11-11 20:00:04

+1 Way to bite the bullet

Josh Stodola 2009-11-11 20:04:18

+1 Nice work, Sam. Now the only bee in my bonnet is what you actually do with the samples. I care less about functions than about the points at which they are called. Then I like to examine individual representative samples, because that tells me more than numbers - it tells me *why* it's spending time, so I can see what the problem actually is. (Sorry to blather on.)

Mike Dunlavey 2009-11-11 20:22:12

... (Also, I look at wall-clock time, not CPU time. Very often the reason the app is hanging is because it's messing with files at a level no one could have guessed.)

Mike Dunlavey 2009-11-11 20:37:46

Answer 8

A:

The worse a problem is, the easier it is to find by this technique.

There is a tool you can get, called Stackshot, that might help in your case. Look here and here.

Mike Dunlavey 2009-11-11 14:58:30

@Mike, Stackshot is designed to work with unmanaged apps and gives unmanaged stack traces, they are not that useful when analyzing managed app performance

Sam Saffron 2009-11-11 19:13:17

@Sam: Yeah that's why I said "might". On the other hand, I thought maybe you could have made your tool do it. In fact, if the subject app is slow enough, the **ugly** way to find the problem is as Brian suggested, take a memory dump, if it can be done at a random time while the app is "hanging". Like if it's 100 times slower than it should be, one dump will be 99% certain to show it.

Mike Dunlavey 2009-11-11 20:03:36

@Mike see my answer, I take managed stack traces so yerp I am using a similar technique to the one you described.

Sam Saffron 2009-11-11 20:06:02

Answer 9

+1 A:

Use SysInternals ProcDump to get a mini-dump and windbg+sos to analyse it.

The ProcDump utility is available here: http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx

Just send the exe to the user and tell him.her to run (for example):

ProcDump MyProgram.exe -c 90 -s 10

This will dump the process if it's consuming over 90% CPU for more then 10 seconds

Nir 2009-11-11 15:20:00

Answer 10

A:

Use the Managed debugger. Helped me before. Just a few files needed. You could probably just see what is happening (perhaps exception handling stuck in a loop).

leppie 2009-11-11 15:25:14

ansaurus

tags:

views:

answers: