views:

175

answers:

5

I've got a memory leak in Upstart init process (pid 1), what options I have on debugging it?

EDIT: Suggest me some real tools for this, manually putting printfs or calculating memory allocations by hand isn't gonna cut it. Also dumping init core and poking around that is not really an option.

UPD1: valgrind doesn't work. Replacing /sbin/init on kernel command line with proper valgrind + init magic doesn't seem to be an option as it tries to access /proc for self for smaps, but those isn't available before init is ran.

UPD2: dmalloc doesn't work either (doesn't compile on ARM).

+2  A: 

You can instrument your memory allocation yourself by hooking malloc/free calls, and counting the number of bytes you allocate and you free each time.

AmineK
+1 for instrumenting
Amigable Clark Kant
+1  A: 

You can also use init unchanged, but create a wrapper which sets the MALLOC_CHECK environment variable to 1 or higher. This will let you see some memory allocation diagnostics.

A variation is to change init source code slightly to set that environment variable itself early before it starts using malloc.

You can also as AmineK suggested add debug code to the init source code itself.

Amigable Clark Kant
A: 

You could try linking your version of upstart with Google's TCMalloc. It comes with a builtin heap checker.

The heap checker can be enabled in two ways:

  • set the environment variable HEAPCHECK to one of { normal | strict | draconian }.
  • set HEAPCHECK to local and check code by hand with HeapProfileLeakChecker objects.

I don't know how to set an environment variable for init however.

caspin
TCMalloc doesn't work on ARM either :/ Thanks for suggestion though.
Tuminoid
+6  A: 

A poor man's solution would be to just log every call malloc and free, then comb through the logs and look for pattern.

ld provides an amazing feature that could help here.

--wrap=symbol

Use a wrapper function for symbol. Any undefined reference to symbol will be resolved to "__wrap_symbol". Any undefined reference to "__real_symbol" will be resolved to symbol.

This can be used to provide a wrapper for a system function. The wrapper function should be called "__wrap_symbol". If it wishes to call the system function, it should call "__real_symbol".

Here is a trivial example:

void *
__wrap_malloc (size_t c)
{
   printf ("malloc called with %zu\n", c);
   return __real_malloc (c);
}

If you link other code with this file using --wrap malloc, then all calls to "malloc" will call the function "__wrap_malloc" instead. The call to "__real_malloc" in "__wrap_malloc" will call the real "malloc" function.

You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed. If you do this, you should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call before the linker has a chance to wrap it to "malloc".


Update

Just to be clear on how this is useful.

  • Add a custom file to Upstart's build.

Like this:

void*__wrap_malloc( size_t c )
{
   void *malloced = __real_malloc(c);
   /* log malloced with its associated backtrace*/
   /* something like: <malloced>: <bt-symbol-1>, <bt-symbol-2>, .. */
   return malloced
}

void __wrap_free( void* addr )
{
   /* log addr with its associated backtrace*/
   /* something like: <addr>: <bt-symbol-1>, <bt-symbol-2>, .. */
   __real_free(addr);
}
  • Recompile upstart with debug symbols (-g) so you can get some nice backtraces. You can still optimize (-O2/-O3) the code if you wish.

  • Link Upstart with the extra LD_FLAGS --wrap=malloc, --wrap=free.
    Now anywhere Upstart calls malloc the symbol will be magically resolved to your new symbol __wrap_malloc. Beautifully this is all transparent to the compiled code as it happens at link time.
    It's like shimming or instrumenting with out any of the mess.

  • Run the recompiled Upstart as usual until you're sure the leak has occured.

  • Look through the logs for mismatch malloceds and addrs.

A couple of notes:

  • The --wrap=symbol feature does not work with function names that are actually macros. So watch out for #define malloc nih_malloc. The this is what libnih does you'd need to use --wrap=nih_malloc and __wrap_nih_malloc instead.
  • Use gcc's builtin backtracing features.
  • All of these changes only affect the recompiled Upstart executable.
  • You could dump the logs to an sqlite DB instead with may make it easier to find mismatch mallocs and frees.
  • you can make you log format an SQL insert statement then just insert them into a database post-mortem for further analysis.
caspin
Note that, if you don't want to use printf here, you could allocate a buffer and put the logging information into it, and then dump it out once init is finished (or just poke in it in the debugger). I expect that the ideal strategy will be to confirm that the results are consistent from run to run, figure out what line of the log contains the malloc without a matching free, and then set __wrap_malloc to trap on that line -- at which point, you can look at the call stack in your debugger and find the offending call.
Brooks Moses
Upstart uses libnih for all its structures and memory handling, and some other components do so too. Hooking some logging to nih's malloc/free doesn't make it :/ Wrapping all libnih's API in Upstart is going to a lot of trouble, but I guess I'm really running out of options here.
Tuminoid
You say "Hooking some logging to nih's malloc/free doesn't make it", but why not? If you can elaborate on the problem, maybe someone can suggest a solution.
Brooks Moses
I already told: "and some other components do so to" -> there is multiple users of nih running, so adding logging in nih itself produces a mess. Thus, I need to wrap nih symbols in init instead, which is much more trouble.
Tuminoid
`--wrap=symbol` can *only* be used to mess with inits symbols. It will not (cannot) affect other users of the library. Further it is as painless as possible to wrap libnih within init. That is what `--wrap=symbol` was invented for. See my update for more details.
caspin
My point was that if I wrap nih in init, I need to wrap 100 functions as nihlib is similar to glib, lots og utility functions that allocate memory. It isn't ask simple as wrapping since malloc/free pair. Anyways, I'll give you the bounty for best try.
Tuminoid
I suspect that all of the memory allocation functions eventually end up calling malloc/realloc. Even libstdc++ calls malloc in its implementation of `new`. If the og utility functions create memory pools you'll have bit more work, maybe 3-4 more functions.
caspin
A: 

How about running pmap on the process and examining what memory segments are growing. That may give you some idea of what is eating memory. A little scripting could make this process almost automatic**.

** In a past life, I actually wrote a script that would take n pmap snapshots of a set of running processes spaced t seconds apart. The output of that was fed into a perl script that identified segments that changed their size. I used it to locate several memory leaks in some commercial code. [I would share the scripts, but they are covered under IP (copyright) of a previous employer.]

  • John
jwernerny