views:

1023

answers:

2

Hi everyone, I´m facing a huge problem since a couple weeks. I´ve an asp.net application hosted under IIS7 (W2008 SP1), and every a couple hours it starts consuming near 50% of the CPU when maybe there're no users connected. It´s understandable since we are using Quartz.net to make some application recicling, but we could not reproduce the problem yet.

Here is a trace made with JetBrains dotTrace 3.1 while the CPU was high: http://mycenter.info/tmp/DotTraceSnapshot.zip

Usually the process wasting CPU is the w3wp.exe, but in the last couple days sqlserver (2008) and memcached (1.2.1, and updated on Monday to 1.2.4 beta) were killing the CPU too. It´s weird that some times memcached starts consuming 100% and the its stats show that it´s quiet, however it works fine when a request is made.

Here´s a crash dump (or stack trace dump) of w3wp, using WinDbg: (based on this guide: http://blogs.technet.com/marcelofartura/archive/2006/09/15/troubleshooting-iis-100-cpu-issues-step-by-step-intermediary.aspx)

0:000> ~
.  0  Id: 1be4.1d3c Suspend: 1 Teb: 7ffdf000 Unfrozen
   1  Id: 1be4.b1c Suspend: 1 Teb: 7ffde000 Unfrozen
   2  Id: 1be4.12a0 Suspend: 1 Teb: 7ffdd000 Unfrozen
   3  Id: 1be4.19d0 Suspend: 1 Teb: 7ffdc000 Unfrozen
   4  Id: 1be4.1714 Suspend: 1 Teb: 7ffd7000 Unfrozen
   5  Id: 1be4.1a18 Suspend: 1 Teb: 7ffd6000 Unfrozen
   6  Id: 1be4.12ac Suspend: 1 Teb: 7ffd5000 Unfrozen
   7  Id: 1be4.dec Suspend: 1 Teb: 7ffd4000 Unfrozen
   8  Id: 1be4.1e48 Suspend: 1 Teb: 7ffd8000 Unfrozen
   9  Id: 1be4.1ca8 Suspend: 1 Teb: 7ffd3000 Unfrozen
  10  Id: 1be4.1508 Suspend: 1 Teb: 7ffaf000 Unfrozen
  11  Id: 1be4.1bc0 Suspend: 1 Teb: 7ffae000 Unfrozen
  12  Id: 1be4.1f48 Suspend: 1 Teb: 7ffad000 Unfrozen
  13  Id: 1be4.1994 Suspend: 1 Teb: 7ffac000 Unfrozen
  14  Id: 1be4.1a48 Suspend: 1 Teb: 7ffab000 Unfrozen
  15  Id: 1be4.12c8 Suspend: 1 Teb: 7ffa8000 Unfrozen
  16  Id: 1be4.e44 Suspend: 1 Teb: 7ffa7000 Unfrozen
  17  Id: 1be4.19e0 Suspend: 1 Teb: 7ffa6000 Unfrozen
  18  Id: 1be4.19b0 Suspend: 1 Teb: 7ffa2000 Unfrozen
  19  Id: 1be4.1b30 Suspend: 1 Teb: 7ffd9000 Unfrozen
  20  Id: 1be4.1bfc Suspend: 1 Teb: 7ffa3000 Unfrozen
  21  Id: 1be4.1be8 Suspend: 1 Teb: 7ffa1000 Unfrozen
  22  Id: 1be4.1a54 Suspend: 1 Teb: 7ffa5000 Unfrozen
  23  Id: 1be4.b74 Suspend: 1 Teb: 7ff3d000 Unfrozen
  24  Id: 1be4.19b4 Suspend: 1 Teb: 7ff3c000 Unfrozen
  25  Id: 1be4.1460 Suspend: 1 Teb: 7ffdb000 Unfrozen
  26  Id: 1be4.1eac Suspend: 1 Teb: 7ffaa000 Unfrozen
  27  Id: 1be4.1b90 Suspend: 1 Teb: 7ffa4000 Unfrozen


0:023> #23s
Search address set to 77dc9a94
*** WARNING: Unable to verify checksum for SMDiagnostics.ni.dll
*** WARNING: Unable to verify checksum for System.Data.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for Microsoft.Web.Services3.DLL
*** WARNING: Unable to verify checksum for System.Windows.Forms.ni.dll
*** WARNING: Unable to verify checksum for System.Web.ni.dll
*** WARNING: Unable to verify checksum for Ademy.UI.Web.DLL
*** ERROR: Module load completed but symbols could not be loaded for AjaxControlToolkit.DLL
*** ERROR: Module load completed but symbols could not be loaded for 7zSharp.DLL
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for Iesi.Collections.DLL
*** WARNING: Unable to verify checksum for System.Design.ni.dll
*** WARNING: Unable to verify checksum for System.Core.ni.dll
*** WARNING: Unable to verify checksum for Ademy.Event.DLL
*** WARNING: Unable to verify checksum for System.ServiceModel.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for System.ServiceModel.ni.dll
*** WARNING: Unable to verify checksum for App_Theme_Ocean.wgubmrqt.dll
*** WARNING: Unable to verify checksum for NHibernate.Burrow.AppBlock.DLL
*** ERROR: Module load completed but symbols could not be loaded for NHibernate.Burrow.AppBlock.DLL
*** WARNING: Unable to verify checksum for NHibernate.Caches.SysCache2.DLL
*** ERROR: Module load completed but symbols could not be loaded for NHibernate.Caches.SysCache2.DLL
*** WARNING: Unable to verify checksum for Ademy.UI.Web.Controls.DLL
*** WARNING: Unable to verify checksum for Microsoft.JScript.ni.dll
*** WARNING: Unable to verify checksum for System.Web.Mobile.ni.dll
*** WARNING: Unable to verify checksum for System.Runtime.Serialization.ni.dll
          ^ Memory access error in '#23s'

0:023> kb
ChildEBP RetAddr  Args to Child             
11c6ede4 77dc8ed4 766bc622 0000038c 00000000 ntdll!KiFastSystemCallRet
11c6ede8 766bc622 0000038c 00000000 11c6ee20 ntdll!NtSetEvent+0xc
11c6edf8 011011ef 0000038c 7f52be6e 0fda4888 kernel32!SetEvent+0x10
WARNING: Frame IP not in any known module. Following frames may be wrong.
11c6ee20 71b26ffe 060c5f9c 010039b0 010628a0 0x11011ef
*** WARNING: Unable to verify checksum for System.ni.dll
11c6ee4c 712c4b14 02528958 060c5f9c 11c6ee94 mscorlib_ni+0x216ffe
11c6ee5c 712c4abe 060c5fb0 02528958 060c600c System_ni+0x144b14
11c6ee94 71679260 060c5d24 7167926d 060c5d24 System_ni+0x144abe
11c6eec8 717d8373 060c5d24 11c6f3e8 712c4ce4 System_ni+0x4f9260
11c6ef14 712c4ce4 00000000 02528930 11c6ef74 System_ni+0x658373
11c6ef54 7129dbcb 098b6ac4 11c6efec 72f7eff8 System_ni+0x144ce4
11c6efa4 71b26d66 02df349c 11c6efc0 71b45681 System_ni+0x11dbcb
11c6efb0 71b45681 00000000 0dcfd2d8 11c6efd0 mscorlib_ni+0x216d66
11c6efc0 72f11b4c 766b45f1 00000000 11c6f050 mscorlib_ni+0x235681
11c6efd0 72f221f9 11c6f0a0 00000000 11c6f070 mscorwks!CallDescrWorker+0x33
11c6f050 72f36571 11c6f0a0 00000000 11c6f070 mscorwks!CallDescrWorkerWithHandler+0xa3
11c6f194 72f365a4 71a91ff0 11c6f2c8 11c6f1e8 mscorwks!MethodDesc::CallDescr+0x19c
11c6f1b0 72f365c2 71a91ff0 11c6f2c8 11c6f1e8 mscorwks!MethodDesc::CallTargetWorker+0x1f
11c6f1c8 7302a471 11c6f1e8 68e9b644 0dcfd2d8 mscorwks!MethodDescCallSite::CallWithValueTypes+0x1a
11c6f394 7302a5c6 11c6f424 68e9b194 02df34e4 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x9f
11c6f444 71b45577 11c6f3e8 02df17d0 01c177f8 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0x10f

Thanks in advance for any tip!!

UPDATE:

Here´s the managed stack of the hanged thread: I´m thinking it looks like memcached provider, but not yet sure what should I do.

0:023> !clrstack
OS Thread Id: 0xb74 (23)
ESP       EIP     
11c6ee38 77dc9a94 [NDirectMethodFrameStandaloneCleanup: 11c6ee38] Microsoft.Win32.Win32Native.SetEvent(Microsoft.Win32.SafeHandles.SafeWaitHandle)
11c6ee48 71b26ffe System.Threading.EventWaitHandle.Set()
11c6ee54 712c4b14 System.Net.TimerThread.Prod()
11c6ee64 712c4abe System.Net.TimerThread+TimerQueue.CreateTimer(Callback, System.Object)
11c6eea0 71679260 System.Net.ConnectionPool.CleanupCallbackWrapper(Timer, Int32, System.Object)
11c6eed4 717d8373 System.Net.TimerThread+TimerNode.Fire()
11c6ef1c 712c4ce4 System.Net.TimerThread+TimerQueue.Fire(Int32 ByRef)
11c6ef5c 7129dbcb System.Net.TimerThread.ThreadProc()
11c6efac 71b26d66 System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
11c6efb8 71b45681 System.Threading.ExecutionContext.runTryCode(System.Object)
11c6f3e8 72f11b4c [HelperMethodFrame_PROTECTOBJ: 11c6f3e8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
11c6f450 71b45577 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11c6f46c 71b301c5 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11c6f484 71b26ce4 System.Threading.ThreadHelper.ThreadStart()
11c6f6b0 72f11b4c [GCFrame: 11c6f6b0] 
11c6f9a0 72f11b4c [ContextTransitionFrame: 11c6f9a0] 

SOLUTION FOUND:

It was due to a bug in memcached 1.2.1 for Win32, when running on Windows 2008. I updated to v1.2.6 and everything worked. I guess I was seeing the w3wp process because the library I´m using to connect to memcached has a recycle process which was hanging, even when memcached was still responding.

SOLUTION 2 FOUND:

If the first solution doesn't work, please read THIS POST. I guess the memcached solution just hide the real problem, which was a bug in the SmtpClient.

+2  A: 

In windbg, issue:

~*e !clrstack

This will dump all of the managed thread stacks and should give you an idea of what is happening in that process.

Also try a !runaway, which will show you how much time each thread has been running. Focus on the stacks of the top threads which are the ones that have been running longest.

Matt Wrock
Thanks! The "~*e !clrstack" outputs "No export clrstack found" about 20 times, and the !runaway shows that the thread (#23 / Id: 1be4.b74) was running for 10 minutes. I didn't know that command, but I could figure out that was the bad one following the steps described in the article.Any other idea? How can I make the ~*e !clrstack work?
Diego Jancic
you will need to load sos.dll to get !clrstack to work. Copy sos.dll from the .net framework folder to the folder where windbg is. Then, in windbg type .load sos.dll. It should then load sos.dll and you will be able to run !clrstack.
Matt Wrock
Thanks! I´ve added the managed stack to the question, however it´s not my code which is consuming CPU.. :(
Diego Jancic
I´ve found the solution (see above), thanks for your help!!
Diego Jancic
A: 

Is this possibly being caused by a cache issue? For example, do you have a cached dataset set to automatically re-load from the DB when it expires?

We had this situation once. We had a large dataset that we wanted to be always available. The data didn't change that often, so we set it in the cache with a 1 hour expiration, and then in our global.asax, we handled the removal (as described here without paying attention tot e warning described in the link. We re-loaded the dataset to the cache after the hour had passed and this caused high cpu usage and high db usage every hour.

edit - added

Needless to say, we saw this quickly and learned from our mistake.

David Stratton