views:

350

answers:

8

This is by far the most complex software I've built and now it seems to be running out of memory at some point. I haven't done extensive testing yet, because I'm a bit lost how I should approach the problem at hand.

HandleCount: 277
NonpagedSystemMemorySize: 48136
PagedMemorySize: 1898590208
PagedSystemMemorySize: 189036
PeakPagedMemorySize: 1938321408
VirtualMemorySize: 2016473088
PeakVirtualMemory: 2053062656
WorkingSet: 177774592
PeakWorkingSet: 883834880
PrivateMemorySize: 1898590208
PriviligedProcessorTime: 00:00:15.8593750
UserProcessorTime: 00:00:01.6562500
TotalProcessorTime: 00:00:17.5156250
GDI Objects: 30
User Objects: 27

I have an automated global exception catcher that upon exception gathers the above information (using System.Diagnostics.Process) - along with the exception information, log and a screen shot - and e-mails me everything.

This has been working nicely as I've been able to plug bugs based on the e-mailed information. This is, up until now. The software is tens of thousands of lines and uses managed and unmanaged resources.

I could start going through the code, line by line, but some how I get a feeling this might not be the best approach to try to deduce the memory build-up problem.

As I've never done this kind of analysis before, how would you suggest to approach this kind of a problem?

+2  A: 

Attach a debugger to it and reproduce the error. The call stack at exception time should tell you where the error is.

Either you have a memory leak(s), you're not disposing your objects, or you need better hardware :)

tsilb
IMHO, in this case catching the exception using the debugger will be useless, the damage(memory leak most probably) has already been done somewhere else.
Naveen
Just throwing out a couple options. Can't hurt to look.
tsilb
tslib has a point, sometimes you can narrow down when you are running out of memory.
Gregory
++ Assuming you can enter the debugger when the exception is thrown, chances are good that the allocation that fails is the one causing the memory leak.
Mike Dunlavey
+8  A: 

There are a couple of options. Dedicated memory profilers such as ANTS Memory Profiler from RedGate can be very useful for troubleshooting this kind of problem.

If you don't want to spend money on a dedicated tool, you can also use WinDbg (part of Debugging tools for Windows, a free download from Microsoft). It can show you heap usage for the managed heap, the various AppDomain heaps and so forth.

Have a look at this blog for hints on using WinDbg.

Keep in mind that troubleshooting out of memory can be hard, as you usually don't see the actual problem but merely a symptom. So unlike a crash where the call stack will give you a pretty good indication of the source of the problem, the call stacks for a process with OOM may reveal very little.

In my experience you have to look at where memory is used. It could be on the managed heap, in which case you have to find out if something is holding on to instances longer than necessary. However, it could also be related to loading lots of assemblies (typically assemblies generated on the fly).

Brian Rasmussen
+1 on ANTS memory profiler!
tijmenvdk
+1 For just posting something useful.
Gregory
ANTS, great tool.
BennyM
+3  A: 

Take a look at this MSDN article about detecting memory leaks in .NET applications.

Perhaps you have some issues where memory is getting allocated and never collected.

joshperry
+1  A: 

Your PeakWorkingSet indicates the common number when 32bit CLR's starts to bomb out.

Despite what people tell you, and despite the huge irony of automatic memory management, you have to be aware of this and make sure you never approach that limit on such/32bit systems. Many are unaware of it and I usually love picking up their C# bloat downvotes , but when you run a few of such apps on a single desktop you can expect some havoc to be caused. Just look at the managed portion of VS shutdown, it's like a train running through a PC.

There is a free MemProfiler for .NET, use it and look for the hanging roots.. eventually, and especially as you start dealing with moderate size data, you will have to use design for streaming rather than rely it will run on x64 with more RAM.

And having a c880MB dataset is pathetic in size these days.. FACT!

[Piece to C# 3.0 sheep ]

rama-jka toti
+10  A: 

We provide a tool for that.

http://msdn.microsoft.com/en-us/library/ms979205.aspx

CLR Profiler enables you to look at the managed heap of a process and investigate the behavior of the garbage collector. Using the various views in the tool, you can obtain useful information about the execution, allocation, and memory consumption of your application.

Using CLR Profiler, you can identify code that allocates too much memory, causes too many garbage collections, and holds on to memory for too long.

Eric Lippert
A: 

Perhaps you should check the places where you use unmanaged resources, first. The problem might be that you don't release them, or you don't do it correctly.

Venemo
+2  A: 

I have exactly the same application. :) Our application use to take up to 10GB of RAM. This is obviously bad. After some optimization I managed to decrease memory usage about 50 times, so now same data set takes up to 200MB. Magic? No. :) What I did:

  1. Some data was stored in the memory several times (several copies). I made one copy of each bunch of data.
  2. Some data was stored as string, but more efficient way is int because those strings contained digits only.
  3. The main data storage class was Dictionary<uint,uint>. We wrote our own dictionary which do not store any hashes - as the result memory usage decreased 3 times on 64bit systems, and 2 times on 32bit systems.

So my question is: what is the main class/object you use to store data? What kind of the data you store?

Vasiliy Borovyak
A: 

A lot of useful solutions have already been suggested and the MSDN article is very thorough. In conjunction with the suggestions above I would also do the following;

Correlate the time of the exception with your log file to see what was going on at the time of the OOM exception. If you have little logging at info or debug level I would suggest adding some logging so you have an idea of the context around this error.

Does the memory usage gradually increase over a long period of time before the exception (e.g. a server process that runs indefinitely) or a does it jump up in large increases quite quickly until the exception? Are lots of threads running or just one?

If the first is true and the exception doesn’t occur for a long time it would imply that resources are leaking are leaking as stated above. If the later is true a number of things could contribute to the cause e.g. a loop that allocates a lot of memory per iteration, receiving a very large set of results from a service etc. etc.

Either way the log file should provide you with enough information on where to start. From there I would ensure I could recreate the error either by issuing a certain set of commands in the interface or by using a consistent set of inputs. After that depending on the state of the code I would try (with the use of the log file info) to create some integration tests that targeted the assumed source of the problem. This should allow you to recreate the error condition much faster and make it a lot easier to find as the code you are concentrating on will be a lot smaller.

Other things I tend to do is surround memory sensitive code with a small profiling class. This can log memory usage to the log file and give you immediate visibility of problems in the log. The class can be optimized so it's not compiled into release builds or has a tiny performance overhead (if you need more info contact me). This type of approach doesn't work well when lots of threads are allocating

You mentioned unmanaged resources I assume all the code you / your team has written is managed? If not and if possible I would surround the unmanaged boundaries with a profiling class similar to the one mentioned above to rule out leaks from unmanaged code or interop. Pinning lots of unmanaged pointers can also cause heap fragmentation but if you have no unmanaged code both of these points can be ignored.

Explicitly calling the garbage collector in an earlier comment was discouraged. Although you should rarely do this there are times where it is valid (search Rico Mariani's blog for examples). One example (covered in the blog mentioned) in which I have explicitly called collect is when large amounts of string have been returned from a service, put into a dataset and then bound to a grid. Even after the screen was closed this memory wasn’t collected for some time. In general it shouldn't be called explicitly as the garbage collector maintains metrics on which it bases (among other things) collections on. Calling collect explicitly invalidates these metrics.

Finally it is generally good to have an idea of memory requirements of your application. Either obtain this by logging more information, occasionally running the profiler, stress / unit / integration tests. Get an idea of what impact a certain operation with have at a high level e.g. based on a set of inputs roughly x will be allocated. I gain an understanding of this by logging out detailed information at strategic points in the log file. A bloated log file can be hard to understand or interpret.

Ian Gibson