views:

223

answers:

4

Hi

I know there is no simple answer to my question but I would appreciate ideas, guides or some sort of things-to-look-at list

I have a net Windows service that is constantly throwing OutOfMemoryException. The service has two builds for x86 and x64 Windows. However on x64 it consumes a lot more memory. I have tried profiling it with various memory profilers. But I cannot get a clue what the problem is. The diagnosis - service consumes lot of VMSize and crashes app after 3 to 12 hours. The behaviuor is rather stochastic - there is no observable pattern for crash scenario.

Also I tried to look at performance counters (perfmon.exe). What I can see is that heap size is growing and %GC time is on average 19%. Plus memory allocation is correlated with %CPU time.

My application has threads and locking objects, DB connections and WCF interface. The general question that I am trying to solve:

Is it simply GC not been fast enough to GC objects or some non-managed (windows) objects are consuming memory?

See first app in list http://s45.radikal.ru/i109/1003/af/92a389d189e8.jpg

The link to picture with performance counters view http://s006.radikal.ru/i215/1003/0b/ddb3d6c80809.jpg

+7  A: 

Is your issue that you don't know what is consuming a lot of memory? You can open up task manager when the process is using a lot of memory, right click your process and create a dump file which you can examine in windbg to find out exactly what's allocating memory.

Tess Ferrandez has a lot of excellent demos. She goes through the most useful stuff here...

David Hedlund
+3  A: 

I have used .Net Memory Profiler it much better than clr profiler by microsoft. You have to learn about it a little bit. It can tell you which object are not disposing or have references. You can also sort object base on there type and memory. I used the trial ver which last 30 days during which i was able to solve problem in my application.

affan
+4  A: 

Your problem is likely to be either a classic leak (objects that are still rooted when they shouldn't be) or Large Object Heap (LOH) fragmentation.

The best tool I have found for diagnosing this class of problem is the Son of Strike (SOS) extension to the Windows debugger. Download Microsoft's Debugging Tools for Windows to get the debuggers: CDB is the console debugger (which I prefer as it seems more responsive), WinDbg is the same thing wrapped as an MDI app. These tools are quite low-level and have a bit of learning curve but provide everything you need to know to find your problem.

In particular, run !DumpHeap -stat to see what types of objects are eating your memory. This command will also report at the bottom of the list if it notices any significant fragmentation. !EEHeap will list the heap segments — if there are a lot of LOH segments then I would suspect LOH fragmenation.

0:000> .loadby sos mscorwks
0:000> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x00f7a9b0
generation 1 starts at 0x00e79c3c
generation 2 starts at 0x00b21000
ephemeral segment allocation context: none
 segment    begin allocated     size
00b20000 00b21000  010029bc 0x004e19bc(5118396)
Large object heap starts at 0x01b21000
 segment    begin allocated     size
01b20000 01b21000  01b8ade0 0x00069de0(433632)         

If there are many LOH segments then I would begin to suspect LOH fragmentation.

Before doing this, however, I would be interested to know:

  1. Does the application use string.Intern()?
  2. Does the application have transient objects that subscribe to events with long-lived objects?

(The reason I ask this is that 1. the .NET string intern tables are implemented in such a way that they can cause LOH fragmenation and 2. an event subscription provides an additional root for the subscribing object which is easy to forget.)

Paul Ruane
Hi Paul, can CDB or WinDbg attach to running process?
Captain Comic
Yes, run 'CDB -pv -p 1234' where 1234 is the process ID.
Paul Ruane
(Also, 'q' exits the debugger and unfreezes your process.)
Paul Ruane
+1, WinDbg FTW.
Paolo
A: 

If your percentage time spent on GC is high then I would look at LOH Allocations perfmon counter. If there are frequent allocations in LOH this would cause the GC to work hard to collect, which is the reason for High percentage time spent on GC.

I did blog about the identifying high CPU in GC because of LOH where it shows how to get the exact call-stack which is allocating in LOH.

Hope this helps.

Naveen