views:

1467

answers:

5

I'm having problems with a slow memory leak in my mixed mode C++/CLR .NET application.

(It's C++ native static libraries linked into a VS2008 C++/CLR Windows Forms app with the "/clr" compiler setting)

Typical behaviour: app starts using 30 MB (private memory). Then leaks memory slowish, say a MB every hour when running under simulated heavy load. This simulates the app being live for days or weeks.

I've tried using several tools to track down memory leaks including both the CRT debugging stuff that comes with the Visual Studio CRT libs. I've also used a commercial leak detection tool ("Memory Validator").

Both report negligible memory leaks at shutdown (a few minor entries that amount to a few KB that I'm not worried about). Also, I can see when running that the tracked memory doesn't seem to amount to that much (so I don't believe it is just memory that is being held and only released at app exit). I get around 5 MB of listed memory (out of total > 30MB).

The tool (Memory Validator) is setup to track all memory usage (including malloc, new, Virtual memory allocation and a whole bunch of other types of memory allocation). Basically, every setting for which memory to track has been selected.

The .NET image reports that it is using around 1.5 MB of memory (from perfmon).

Here's a final bit of information: we have a version of the app which runs as a native console application (purely native - not CLR at all). This is 95% the same as the mixed mode except without the UI stuff. This doesn't seem to leak memory at all, and peaks at around 5MB private bytes.

So basically what I'm trying to get across here is that I don't think any of the native code is leaking memory.

Another piece of the puzzle: I found this which refers to memory leaks in mixed mode applications when targeting 2.0 framework (which I am): http://support.microsoft.com/kb/961870

Unfortunately the details are infuriatingly sparse so I'm not sure if it's relevant. I did try targeting 3.5 framework instead of 2.0 but still had the same problem (maybe I didn't do this right).

Anyone have any suggestions?

A few things that might help me:

  • Are there any other kind of memory allocations that I'm not tracking?
  • How come the figures don't add up? I get 5 MB of CRT memory usage, 1.5 MB of .NET memory so how come the whole app uses 30MB private bytes? Is that all tied up in the .NET framework? Why don't I see these in the leak tool? Won't the .NET framework appear as some kind of allocated memory?
  • Any other leak detection tools which work well with mixed mode apps?

Thanks for any help

John

A: 

Try out: DebugDiags.
After generating some memory dumps, it will give you a nice summery of what memory was allocated, and depending on finding your PDB's, it can tell you by whom it was allocated.

Tal
Thanks I'll give that a try
John
A: 

You may have a reference leak, look into ANTS profiling software. Ants Profiler

A reference leak is the .net equivalent of a memory leak, you hold references to an object which stops it being garbage collected, and thus you memory in use starts to go up.

Spence
But wouldn't this show up in the "# Bytes in all Heaps" .NET memory value?I've been watching this in Process Explorer and it doesn't leak. The leak is in private bytes but not in "# Bytes in all Heaps".
John
A: 

Is it possible you've missed some disposers, can happen if your using GDI+ and many other APIs.

If your run the static analysis tool FXCop it has a rule to check if you've called dispose (or used the "using") statements on your objects that provide the interface. In .Net if a function uses unmanaged code it will usually provide a dispose or close method for you to not leak the resource/memory.

Spence
Excellent idea. I guess that these leaks wouldn't appear as CLR memory usage because they're native, but also wouldn't appear in the native leak detection tool because they're not my code. I'll give that a bash (tho I don't know whether FXCop can deal with mixed mode C++/CLI)
John
hope it helps. I'm just shooting in the dark, but I've seen this happen in one of apps that did GDI, I found almost every object in GDI+ needed a disposer. Note that this was before the day's of "using"
Spence
+3  A: 

Like Spence was saying, but for C++/CLI ;)....

For any object which you are using in C++/CLI, if you create more that object's from you C++ code, you should try to use stack allocation seymantics, even though this is a compiler magic sort of thing, it is able to setup the nested __try {} __finally {} statements you may be used to using from native code (that is setup them in a way to not loose a call to Dispose).

Nish's article at the code project here on C++/CLI stack allocation semantics is pretty good and goes into depth about how to emulate using{}.

You should also make sure to delete any object's that implment IDisposable as you can not call Dispose in C++/CLI, delete does this for you, if your not using stack semantics..

I usually call Close myself on Streams and try to assign nullptr when I am finished with object's, just in case.

You may also want to check out this article on memory issues, perticularly about event subscribers, if you are assigning event's to your objects, you may be leaking...

As a last resort (or maybe first:), one thing I have done in the past is make use of the CLR profiler API, here's another article on how to do this, the author's writer (Jay Hilyard) has an example that answers;

  • Of each .NET type that is used, how many object instances are being allocated?
  • How big are the instances of each type?
  • What notifications does the GC provide as it goes through a garbage collection and what can you find out?
  • When does the GC collect the object instances?

Should get you a better idea than some commodity profiler, I've noticed that they can be occasionally misleading depending on your allocation porofile (btw. watch out for large object heap issues, > ~83kb objects are specially handled, in that case, I'd reccomend, getting out of the large object heap :).

Given your comments, a few more things...

I've posted before about image load's not charging quota or any other disernable statistic, what this means, you may need to track down some handle or loader issue (see loader lock eventually), but before that, you can try setting up some Constrained Execution Regions, they can work wonders, but are also unfortunately difficult to retro-fit into non-pure code.

This recent MSDN Mag, article document's a lot of perfmon type memory sperlunking (followup for this older one).

From the VS Perf Blog, they show how to use SOS in visual studio, which can be handy, to track down rouge DLL's, related posts are also good.

Maoni Stephen's Blog and company, he say's he's on the perf team, but essentially 100% of his posts are with respect to the GC so much so he may of well of wrote it.

Rick Byers is a dev with the CLR diagnostics team, many of his blog-buddies are also good source's, however, I would strongly suggest also refering to the quite new dev/diagnostics forum. They have recently expanded the scope of their discussions.

Code Coverage Tools and tracing can often help, to give you an overview of what's actually running.

(specically, those perticular stat's may not be giving you a global view of what is plauging your code, I can say that recently, I have found (even with .net4beta binaries, the profiler from this company, is quite good, it is capable of deriving native/managed leaks's from it's profile traces, brings you back to the exact source lines (even if optimized, quite nice (and it has a 30day trial)))).

Good luck!! Hope some of this helps, it's only fresh in my mind, as I am doing much of the same work right now ;)

RandomNickName42
Thanks - some good comments here. Here's the thing: Neither the "# Total committed Bytes" nor "Large Object Heap siz" measures in the .NET performance objects seem to indicate a leak in .NET objects.Now I'm much more familiar with debugging memory leaks in Native code than CLR, so perhaps I've misunderstood what these figures mean. Could you correct me on this? Would CLR memory leaks appear in the aforementioned performance objects?
John
+1  A: 

OK I finally found the problem.

It was caused by an incorrect setting for /EH (Exception Handling).

Basically, with mixed mode .NET apps, you need to make sure all statically linked libs are compiled with /EHa instead of the default /EHs.

(The app itself must also be compiled with /EHa, but this is a given - the compiler will report error if you don't use it . The problem is when you link in other static native libs.)

The problem is that exceptions caught in the managed bit of the app, which were thrown within native libraries compiled with /EHs end up not handling the exception correctly. Destructors for C++ objects are not then called correctly.

In my case, this only occurred in a rare place, hence why it took me ages to spot.

John