views:

3533

answers:

6

An ASP.NET web app running on IIS6 periodically shoots the CPU up to 100%. It's the W3WP that's responsible for nearly all CPU usage during these episodes. The CPU stays pinned at 100% anywhere from a few minutes to over an hour.

This is on a staging server and the site is only getting very light traffic from testers at this point.

We've running ANTS profiler on the server, but it's been unenlightening.

Where can we start finding out what's causing these episodes and what code is keeping the CPU busy during all that time?

A: 

We had this on a recursive query that was dumping tons of data to the output - have you double checked everything does exit and no infinite loops exist?

Might try to narrow it down with a single page - we found ANTS to not be much help in that same case either - what we ended up doing was running the site hit a page watch the CPU - hit the next page watch CPU - very methodical and time consuming but if you cant find it with some code tracing you might be out of luck -

We were able to use IIS log files to track it to a set of pages that were suspect -

Hope that helps !

codemypantsoff
+3  A: 

If your CPU is spiking to 100% and staying there, it's quite likely that you either have a deadlock scenario or an infinite loop. A profiler seems like a good choice for finding an infinite loop. Deadlocks are much more difficult to track down, however.

Use dotTrace profiler and choose "Call Tree" by "All Threads" that should show the methods were most CPU is spent grouped into 1 call stack. Track down to see exactly where.
Peter Gfader
+2  A: 

It's not much of an answer, but you might need to go old school and capture an image snapshot of the IIS process and debug it. You might also want to check out Tess Ferrandez's blog - she is a kick a** microsoft escalation engineer and her blog focuses on debugging windows ASP.NET, but the blog is relevant to windows debugging in general. If you select the ASP.NET tag (which is what I've linked to) then you'll see several items that are similar.

Michael Bray
A: 

Also, look at your perfmon counters. They can tell you where a lot of that cpu time is being spent. Here's a link to the most common counters to use: http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/852720c8-7589-49c3-a9d1-73fdfc9126f0.mspx?mfr=true http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/be425785-c1a4-432c-837c-a03345f3885e.mspx?mfr=true

RockySanders99
+1  A: 

Processor explorer is a excellent tool for troubleshooting. you can try it for finding the problem of high cpu usage. It gives you an insight into the way your application works.

You can also try Procdump(http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx) to dump the process and analyze what really happened on CPU.

sky100
+6  A: 
  1. Standard Windows performance counters (look for other correlated activity, such as many GET requests, excessive network or disk I/O, etc); you can read them from code as well as from perfmon (to trigger data collection if CPU use exceeds a threshold, for example)
  2. Custom performance counters (particularly to time for off-box requests and other calls where execution time is uncertain)
  3. Load testing, using tools such as Visual Studio Team Test or WCAT
  4. If you can test on or upgrade to IIS 7, you can configure Failed Request Tracing to generate a trace if requests take more a certain amount of time
  5. Use logparser to see which requests arrived at the time of the CPU spike
  6. Code reviews / walk-throughs (in particular, look for loops that may not terminate properly, such as if an error happens, as well as locks and potential threading issues, such as the use of statics)
  7. CPU and memory profiling (can be difficult on a production system)
  8. Process Explorer
  9. Windows Resource Monitor
  10. Detailed error logging
  11. Custom trace logging, including execution time details (perhaps conditional, based on the CPU-use perf counter)
  12. Are the errors happening when the AppPool recycles? If so, it could be a clue.
RickNZ
I'll just add for the record what ended up taking us to the source of the problem: **SQL Profiler**. We had a complex LINQ to SQL query that included a reference to an in-memory object, so that it was unable to translate the whole query to memory and instead was firing off literally thousands of little SQL queries to perform a join.
Herb Caudill