views:

742

answers:

12

My program, alas, has a memory leak somewhere, but I'll be damned if I know what it is.

Its job is to read in a bunch of ~2MB files, do some parsing and string replacement, then output them in various formats. Naturally, this means a lot of strings, and so doing memory tracing shows that I have a lot of strings, which is exactly what I'd expect. The structure of the program is a series of classes (each in their own thread, because I'm an idiot) that acts on an object that represents each file in memory. (Each object has an input queue that uses a lock on both ends. While this means I get to run this simple processing in parallel, it also means I have multiple 2MB objects sitting in memory.) Each object's structure is defined by a schema object.

My processing classes raise events when they've done their processing and pass a reference to the large object that holds all my strings to add it to the next processing object's queue. Replacing the event with a function call to add to the queue does not stop the leak. One of the output formats requires me to use an unmanaged object. Implementing Dispose() on the class does not stop the leak. I've replaced all the references to the schema object with an index name. No dice. I got no idea what's causing it, and no idea where to look. The memory trace doesn't help because all I see are a bunch of strings being created, and I don't see where the references are sticking in memory.

We're pretty much going to give up and roll back at this point, but I have a pathological need to know exactly how I messed this up. I know Stack Overflow can't exactly comb my code, but what strategies can you suggest for tracking this leak down? I'm probably going to do this in my own time, so any approach is viable.

+8  A: 

One technique I would try is to systematically reduce the amount of code you need to demonstrate the problem without making the problem go away. This is informally known as "divide and conquer" and is a powerful debugging technique. Once you have a small example that demonstrates the same problem, it will be much easier for you to understand. Perhaps the memory problem will become clearer at that point.

Greg Hewgill
+1 I call this "binary search", because I disable half the code and test to see if the problem still exists. Repeat with the other half. Assuming I get consistent results, I can now disable half of the half that contains the problem, and so on until I've isolated the cause.
Daniel Earwicker
+5  A: 

There is only one person who can help you. That person's name is Tess Ferrandez. (hushed silence)

But seriously. read her blog (the first article is pretty pertinent). Seeing how she debugs this stuff will give you a lot of deep insight into knowing what's going on with your problem.

Dave Markle
+2 (if I could). Link to most relevant entry: http://blogs.msdn.com/tess/archive/2009/02/27/net-memory-leak-reader-email-are-you-really-leaking-net-memory.aspx
Richard
+2  A: 

I like the CLR Profiler from Microsoft. It provides some great tools for visualizing the managed heap and tracking down leaks.

Mo Flanagan
A: 
  1. Add code to the constructor of the unamanaged object to log when it's onstructed, and sort a unique ID. Use that unique ID when the object is destroyed again, and you can at least tell which ones are going astray.
  2. Grep the code for every place you construct a new object; follow that code path to see if you have a matching destroy.
  3. Add chaining pointers to the constructed objects, so you have a link to the object constructed before and after the current one. Then you can sweep through them later.
  4. Add reference counters.
  5. Is there a "debug malloc" available?
Charlie Martin
A: 

The managed debugging add in SoS (Son of Strike) is immensely poweful for tracking down managed memory 'leaks' since they are, by definition discoverable from the gc roots.

It will work in WinDbg or Visual studio (though it is in many respects easier to use in WinDbg)

It is not at all easy to get to grips with. Here is a tutorial

I would second the recommendation to check out Tess Fernandez's blog too.

ShuggyCoUk
A: 

I use the dotTrace profiler for tracking down memory leaks. It's a lot more deterministic than methodological trial and error and turns up results a lot faster.

For any actions that the system performs, I take a snapshot then run a few iterations of the function, then take another snapshot. Comparing the two will show you all the objects that were created in between but were not freed. You can then see the stack frame at the point of their creation, and therefore work out what instances are not being freed.

Drew Noakes
A: 

How do you know for a fact that you actually have a memory leak?

One other thing: You write that your processing classes are using events. If you have registered an event handler it will keep the object that owns the event alive - i.e. the GC cannot collect it. Make sure you de-register all event handlers if you want your objects to be garbage collected.

Jakob Christensen
I wait until the program is using half a gigabyte of memory. Then I'm pretty sure.
Merus
This is a bit flippant, so: there's one scenario where I can run it so that it's reading in data, parsing it, and reading it out, and it's using only about 50MB. This is fine. If I add other stages, memory usage balloons.
Merus
+1  A: 

Get this: http://www.red-gate.com/Products/ants_profiler/index.htm

The memory and performance profiling are awesome. Being able to actually see proper numbers instead of guessing makes optimisation pretty fast. I've used it quite a bit at work for reducing the memory footprint of our main app.

Jamie Penney
Tried ANTS and it was terrible. It slowed performance to a crawl, which made it even more difficult to follow what was going on.
Merus
Every profiler I've tried has slowed performance. You are going to spend a lot of time searching for possible answers unless you use a tool, and you'll probably overlook something simple.
Jamie Penney
A: 

If your unmanaged object really is the cause of the leak, you may want to have it call AddMemoryPressure when it allocates unmanaged memory and RemoveMemoryPressure in Finalize/Dispose/where ever it deallocates the unmanaged memory. This will give the GC a better handle on the situation, because it may not realize there's a need to schedule collection otherwise.

Logan Capaldo
strictly speaking it is less likely to trigger a garbage collection. There is noting about the added memory pressure which makes the object a target for GC specifically.
ShuggyCoUk
Feel free to just edit the answer to be correct without quoting me. I won't be offended and it makes the answer much more readable. I can always delete the comment if it becomes out of place...
ShuggyCoUk
I just try to give credit where credit is due. But I agree it's more readable otherwise.
Logan Capaldo
A: 

Be careful how you define "leak". "Uses more memory" or even "uses too much memory" is not the same as "memory leak". This is especially true in a garbage-collected environment. It may simply be that GC hasn't needed to collect the extra memory you're seeing used. Also be careful about the difference between virtual memory use and physical memory use.

Finally not all "memory leaks" are caused by "memory" sorts of issues. I was once told (not asked) to fix an urgent memory leak that was causing IIS to restart frequently. In fact, I did profiling and found I was using a lot of strings through the StringBuilder class. I implemented an object pool (from an MSDN article) for the StringBuilders, and memory usage went down substantially.

IIS still restarted just as frequently. This was because there was no memory leak. Instead, there was unmanaged code that claimed to be thread-safe but was not. Using it in a web service (multiple threads) caused it to write all over the C Runtime Library heap. Since nobody was looking for unmanaged exceptions, nobody saw this until I happened to do some profiling with AQtime from Automated QA. It happens to have an events window, that happened to display the cries of pain from the C Runtime Library.

Placed locks around the calls to the unmanaged code, and the "memory leak" went away.

John Saunders
A: 

You mentioned that your using events. Are you removing the handlers from those events when your done with your object? I've found that 'loose' event handlers will cause a lot of memory leak problems if you add a bunch of handlers without removing them when your done.

Justin Drury
A: 

The best memory profiling tool for .Net is this:

http://memprofiler.com

Also, while I'm here, the best performance profiler for .Net is this:

http://www.yourkit.com/dotnet/download/index.jsp

They are also great value for money, have low overhead and are easy to use. Anyone serious about .Net development should consider both of these a personal investment and purchase immediately. Both of them have a free trial.

I work on a real time game engine with over 700k lines of code written in C# and have spent hundreds of hours using both these tools. I have used the Sci Tech product since 2002 and YourKit! for the last three years. Although I've tried quite a few of the others I have always returned to these.

IMHO, they are both absolutely brilliant.