Hello,
I just spent a whole week tracking down and whacking memory leaks over the head, and I arrived on the other end of that week a little dazed. There has to be a better way to do this, is all I can think, and so I figured it was time to ask about this rather heavy subject.
This post turned out to be rather huge. Apologies for that, though I think in this case, explaining the details as thoroughly as possible is warranted. Explicitly so, because it gives you the whole picture of all the things I did to find this bugger, which was a lot. This bug alone took me roughly three 10+ hour days to track down...
When I hunt leaks
When I hunt leaks I tend to do it in phases, where I escalate "deeper" into the problem if it's not solvable in an earlier phase. These phases begin with Leaks telling me there's an issue.
In this particular case (which is an example; the bug is solved; I'm not asking for answers to solving this bug, I'm asking for ways to improve the process in which I find the bug), I am finding a leak (two, even) in a multithreaded application which is fairly large, especially including the 3 or so external libraries I'm using in it (unzip feature and http server). So let's see the process where I fix this leak.
Phase 1: Leaks tells me there's a leak
Well, that's interesting. Since my app is multithreaded, my first thought is that I forgot to put an NSAutoreleasePool
in somewhere, but after checking in all the right places, this is not the case. I take a look at the stack trace.
Phase 2: The stack trace
Both of the GeneralBlock-160
leaks have identical stack traces (which is odd since I have it grouped by "identical backtraces", but anyway), which start at thread_assign_default
and end at malloc
under _NSAPDataCreate
. In between, there is absolutely nothing that correlates to my app. Not a single of those calls are "mine". So I do some Googling around to figure out what these might be used for.
First we have a number of methods which obviously have to do with a thread callback, such as POSIX thread calls going into NSThread calls.
At #8-6 in this (inverted) stack trace, we have +[NSThread exit]
followed by pthread_exit
and _pthread_exit
which is interesting, but in my experience I can't really tell if it's indicative of some specific case or if it's simply "how things go".
After that we have a thread cleanup method called _pthread_tsd_cleanup
-- whatever "tsd" stands for I'm not sure, but regardless, I move on.
At #4-#3 we have:
CA::Transaction::release_thread(void*)
CAPushAutoreleasePool
Interesting. We have Core Animation
here. That, I've learned the very hard way, means that I'm probably doing UIKit
calls from a background thread, which I must not. The big question is where, and how. While it may be easy to say "thou shalt not call UIKit
from ye olde background thread", it's not as easy to know what exactly constitutes as a UIKit
call. As you'll see in this case, it's far from obvious.
Then #2-1 turn out to be way too low level to be of any real use. I think.
I still have no clue where to even begin looking for this memory leak. So I do the only thing I can think of.
Phase 3: return
galore
Propose we have a call tree that looks something like this:
App start
|
Some init
| \
A init B init - Other case - Fourth case
\ / \
Some case Third case
|
Fifth case
...
Rough outline of an app's lifecycle, that. In short, we have a number of paths the app can take depending on whatever happens, and each of these paths comprise of a bunch of code being called in various places. So I pull out the scissors and start chopping. I start close towards "App start" initially, and slowly move down the line towards crossroads, where I only allow one path.
So I have
// ...
[fooClass doSomethingAwesome:withThisCoolThing];
// ...
And I do
// ...
return;
[fooClass doSomethingAwesome:withThisCoolThing];
// ...
And then install the app on the device, close it down, alt-tab to Instruments, hit cmd-R, hammer on the app like a monkey, look for leaks, and after maybe 10 "cycles" if there's nothing, I conclude that the leak is further down the code. Possibly in fooClass
's doSomethingAwesome:
or below the call to fooClass
.
So I move that return one step below the call to fooClass
and test again. If the leak doesn't appear now, great, fooClass
is innocent.
There are a few issues with this method.
- Memory leaks tend to be a bit snobbish about when to reveal themselves. You need romantic music and candles, so to say, and cutting one end of in one place sometimes results in the memory leak deciding not to appear at all. I often had to go back because the leak had appeared after I added, say, this line:
UIImage *a;
(which obviously isn't leaking by itself) - It's excruciatingly slow and tiring to do for a big program. Especially if you end up having to back up again.
- It's hard to keep track of. I kept putting in
// 17 14.48.25: 3 leaks @ RSx10
which in English meant "July 17th, 14:48.25: 3 leaks occured when I repeatedly selected the item 10 times" sprinkled throughout the entire app. Messy, but at least it let me see clearly where I'd tested things and what the results were.
This method eventually took me down to the very bottom of a class which handled thumbnails. The class had two methods, one which initialized things and then did a [NSThread detachThreadWithSeparator:]
call to a separate method which processed the actual images and put them into the individual views after scaling them down to the right size.
It was sort of like this:
// no leaks if I return here
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];
// leaks appear if I return here
But if I went into -loadThumbnails
and stepped down through it, the leaks would disappear and appear in a very random fashion. At one extensive run, I would have leaks and if I moved the return statement down below e.g. UIImage *small, *bloated;
I would have leaks appearing. In short, it was very erratic.
After some more testing, I realized that leaks would tend to appear more often if I reloaded things quicker while in the app. After many hours of pain, I realized that if this external thread did not finish executing before I loaded another session (thus creating a second thumbnail class and discarding this one), the leak would appear.
That's a nice clue. So I added a BOOL
called worldExists
which was set to NO
as soon as a new session was initiated, and then started sprinkling -loadThumbnails
's for
loop with
if (worldExists) [action]
if (worldExists) [action 2]
// ...
and also made sure to exit the loop as soon as I found out that !worldExists
. But the leak remained.
And the return
method was showing leaks in very erratic places. Randomly, it appeared.
So I tried adding this at the very top of -loadThumbnails
:
for (int i = 0; i < 50 && worldExists; i++) {
[NSThread sleepForTimeInterval:0.1f];
}
return;
And believe it or not, but the leaks actually appeared if I loaded a new session within 5 seconds.
Finally, I put a breakpoint in -dealloc
for the thumbnail class. The stack trace for this looked like this:
#0 -[Thumbs dealloc] (self=0x162ec0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:28
#1 0x32c0571a in -[NSObject release] ()
#2 0x32b824d0 in __NSFinalizeThreadData ()
#3 0x30c3e598 in _pthread_tsd_cleanup ()
#4 0x30c3e2b2 in _pthread_exit ()
#5 0x30c3e216 in pthread_exit ()
#6 0x32b15ffe in +[NSThread exit] ()
#7 0x32b81d16 in __NSThread__main__ ()
#8 0x30c8f78c in _pthread_start ()
#9 0x30c85078 in thread_start ()
Well... that doesn't look too bad. If I wait until the -loadThumbnails
method is finished, the trace looks different though:
#0 -[Thumbs dealloc] (self=0x194880, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/Thumbs.m:26
#1 0x32c0571a in -[NSObject release] ()
#2 0x00009556 in -[WorldLoader dealloc] (self=0x192ba0, _cmd=0x32299664) at /Users/me/Documents/myapp/Classes/WorldLoader.m:33
#3 0x32c0571a in -[NSObject release] ()
#4 0x000045b2 in -[WorldViewController setupWorldWithPath:] (self=0x11e9d0, _cmd=0x3fee0, path=0x4cb84) at /Users/me/Documents/myapp/Classes/WorldViewController.m:98
#5 0x32c29ffa in -[NSObject performSelector:withObject:] ()
#6 0x32b81ece in __NSThreadPerformPerform ()
#7 0x32c23c14 in CFRunLoopRunSpecific ()
#8 0x32c234e0 in CFRunLoopRunInMode ()
#9 0x30d620da in GSEventRunModal ()
#10 0x30d62186 in GSEventRun ()
#11 0x314d54c8 in -[UIApplication _run] ()
#12 0x314d39f2 in UIApplicationMain ()
#13 0x00002fd2 in main (argc=1, argv=0x2ffff5dc) at /Users/me/Documents/myapp/main.m:14
Quite different, in fact. At this point, I was still clueless, believe it or not, but I finally figured out what was going on.
The problem is the following: when I do [NSThread detachNewThreadSelector:]
in the thumbnail loader, NSThread
retains the object until the thread runs out. In the case where the thumbnail loading doesn't finish before I load another session, all of my retains on the thumbnail loader are released, but since the thread is still running, NSThread
keeps it alive.
As soon as the thread returns from -loadThumbnails
, NSThread
releases it, it hits 0 retain and goes straight into -dealloc
... while still in the background thread.
And when I then call [super dealloc]
, UIView
obediently tries to remove itself from its superview, which is a UIKit
call on a background thread. Consequently a leak occurs.
The solution I came up with to solve this was to wrap the loader in two other methods. I renamed it to -_loadThumbnails
and then did the following:
[self retain]; // <-- added this before the detaching
[NSThread detachNewThreadSelector:@selector(loadThumbnails) toTarget:self withObject:nil];
// added these two new methods
- (void)doneLoadingThumbnails
{
[self release];
}
-(void)loadThumbnails
{
[self _loadThumbnails];
[self performSelectorOnMainThread:@selector(doneLoadingThumbnails) withObject:nil waitUntilDone:NO];
}
All that said (and I said a lot -- sorry about that), the big question is: how do you figure these odd-ball things out without going through all of the above?
What reasoning did I miss in the above process? At what point did you realize where the problem was? What were the redundant steps in my method? Can I skip phase 3 (return
galore) somehow, or cut it down, or make it more efficient?
I know this question is, well, vague and huge, but this whole concept is vague and huge. I'm not asking you to teach me how to find leaks (I can do that... it's just very, very painful), I'm asking what people tend to do to cut down on the process time. Asking people "how do you find leaks?" is impossible, because there are so many different kinds. But the one type I tend to have issues with is the one that looks like the above, with no calls inside your actual app.
What process do you use to more efficiently track it down?