We are developing a rather large Windows Forms application. In several customers' computers it often crashes with OutOfMemory exception. After obtaining full memory dump of the application moments after the exception (clrdump invoked from UnhandledException handler) I analyzed it with ".NET Memory Profiler" and windbg.
The Memory Profiler has shown only 130MB in live object instances. What's interesting is that for many object types is has shown a very large number of unreachable instances (e.g. 22000 unreachable Byte[] instances). In native memory statistics it totals 127MB in all heaps for Data (which is ok), but indicates unreachable 133MB in gen #2 heap and 640MB in large heap(not ok!).
When analyzing the dump with windbg, the above stats are confirmed:
!dumpheap -stat
..... acceptable object sizes...
79330a00 467216 30638712 System.String
0016d488 4804 221756612 Free
79333470 27089 574278304 System.Byte[]
The application does use large number of short buffers through its run time, but does not leak them. Testing many of the Byte[] instances with !gcroot ends up with no roots. Obviously most of those arrays are unreachable as indicated by the memory profiler.
Just to ensure all is fine, !finalizequeue shows no objects are waiting to be finalized
generation 0 has 138 finalizable objects (18bd1938->18bd1b60)
generation 1 has 182 finalizable objects (18bd1660->18bd1938)
generation 2 has 75372 finalizable objects (18b87cb0->18bd1660)
Ready for finalization 0 objects (18bd1b60->18bd1b60)
And also check for native finalizer thread stack trace shows it is not blocked.
At the moment I don't how to diagnose why the GC doesn't collect the data (and I believe it would love to since the process ran out of memory..)
edit: Based in input below I read some more on Large Object Heap fragmentation and it seems that this could be the case.
I have seen some advices to allocate bigger blocks of memory for this kind of data (various byte[] in my case) and manage the memory in this area by myself, but this seems like a rather hackish solution, not the one I would expect to resolve a problem with not-so-special desktop application.
The fragmentation issue is caused by the fact (At least that is what many people from Microsoft state in blogs) that objects on LOH are not relocated during existence, which is understandable, but it seems logical that once some memory pressure is reached, such as a threat of getting OOM, relocation should be performed.
The only thing that worries me before fully trusting that fragmentation is the cause, is that so many object on the LOH are without gcroot references - is this because even for LOH garbage collection is performed only partially?
I'll be happy for pointing me to any interesting solution as at the moment the only one that I know of is custom management of some preallocated memory block.
Any ideas are welcome. Thanks.