views:

348

answers:

13
+3  Q: 

How to log mallocs

This is a bit hypothetical and grossly simplified but...

Assume a program that will be calling functions written by third parties. These parties can be assumed to be non-hostile but can't be assumed to be "competent". Each function will take some arguments, have side effects and return a value. They have no state while they are not running.

The objective is to ensure they can't cause memory leaks by logging all mallocs (and the like) and then freeing everything after the function exits.

Is this possible? Is this practical?

p.s. The important part to me is ensuring that no allocations persist so ways to remove memory leaks without doing that are not useful to me.

+1  A: 

Can't you just force them to allocate all their memory on the stack? This way it would be garanteed to be freed after the function exits.

Mo
+1  A: 

A better solution than attempting to log mallocs might be to sandbox the functions when you call them—give them access to a fixed segment of memory and then free that segment when the function is done running.

Unconfined, incompetent memory usage can be just as damaging as malicious code.

Jekke
+2  A: 

You could run the third party functions in a separate process and close the process when you are done using the library.

+3  A: 

First, you have to provide the entrypoints for malloc() and free() and friends. Because this code is compiled already (right?) you can't depend on #define to redirect.

Then you can implement these in the obvious way and log that they came from a certain module by linking those routines to those modules.

The fastest way involves no logging at all. If the amount of memory they use is bounded, why not pre-allocate all the "heap" they'll ever need and write an allocator out of that? Then when it's done, free the entire "heap" and you're done! You could extend this idea to multiple heaps if it's more complex that that.

If you really do need to "log" and not make your own allocator, here's some ideas. One, use a hash table with pointers and internal chaining. Another would be to allocate extra space in front of every block and put your own structure there containing, say, an index into your "log table," then keep a free-list of log table entries (as a stack so getting a free one or putting a free one back is O(1)). This takes more memory but should be fast.

Is it practical? I think it is, so long as the speed-hit is acceptable.

Jason Cohen
A: 

Since you're worried about memory leaks and talking about malloc/free, I assume you're in C. I'm also assuming based on your question that you do not have access to the source code of the third party library.

The only thing I can think of is to examine memory consumption of your app before & after the call, log error messages if they're different and convince the third party vendor to fix any leaks you find.

17 of 26
+4  A: 

You don't specify the operating system or environment, this answer assumes Linux, glibc, and C.

You can set __malloc_hook, __free_hook, and __realloc_hook to point to functions which will be called from malloc(), realloc(), and free() respectively. There is a __malloc_hook manpage showing the prototypes. You can add track allocations in these hooks, then return to let glibc handle the memory allocation/deallocation.

It sounds like you want to free any live allocations when the third-party function returns. There are ways to have gcc automatically insert calls at every function entrance and exit using -finstrument-functions, but I think that would be inelegant for what you are trying to do. Can you have your own code call a function in your memory-tracking library after calling one of these third-party functions? You could then check if there are any allocations which the third-party function did not already free.

DGentry
A: 

@Mo How can I prevent them from using malloc? (assume I'm handed a dynamic lib SO/DLL)

@Jekke again, how do I force them to stay in a given memory block

@Jason Cohen I like your first idea, I think that some fun with linker directive might make that work.

@17 of 26 "C like" is close enough as I'm not thinking of any language in particular. Yes, I'm thinking binaries. One scenario would have access to the source but would not be allowed to alter it in any way.

BCS
A: 

@Denton Gentry

I like it! I'm assuming a single point of call for the functions in question so the "who does the free" thing is a non-issue.

I guess the only issue that leaves open would be if they try to mmap in something, but I don't think that is too big an issue in my place.

BCS
+1  A: 

In the past I wrote a software library in C that had a memory management subsystem that contained the ability to log allocations and frees, and to manually match each allocation and free. This was of some use when attempting to find memory leaks, but it was difficult and time consuming to use. The number of logs was overwhelming, and it took an extensive amount of time to understand the logs.

That being said, if your third party library has extensive allocations, its more then likely impractical to track this via logging. If you're running in a Windows environment, I would suggest using a tool such as Purify[1] or BoundsChecker[2] that should be able to detect leaks in your third party libraries. The investment in the tool should pay for itself in time saved.

[1]: http://www-01.ibm.com/software/awdtools/purify/ Purify

[2]: http://www.compuware.com/products/devpartner/visualc.htm BoundsChecker

Steve Wranovsky
A: 

@Steve W

good point in most cases, but not applicable in my case for a few reasons:

My definition of a "memory leak" is much more demanding including any internal preservation of state. This is not allowed.

void()
{
  static void* f;
  if(f != NULL) free(f);
  f = malloc(16);
}

For that matter, I should really check to see if there are any globals in the code (crud :( )

Also, I don't need to fix the errors, just deal with them.

BCS
A: 

If you have money to spare, then consider using Purify to track issues. It works wonders, and does not require source code or recompilation. There are also other debugging malloc libraries available that are cheaper. Electric Fence is one name I recall. That said, the debugging hooks mentioned by Denton Gentry seem interesting too.

Jonathan Leffler
A: 

If you're too poor for Purify, try Valgrind. It it a lot better than it was 6 years ago and a lot easier to dive into than Purify.

Mitch Haile
Yeah - valgrind works pretty well, too. Thanks for the reminder.
Jonathan Leffler
A: 

Microsoft Windows provides (use SUA if you need a POSIX), quite possibly, the most advanced heap+(other api known to use the heap) infrastructure of any shipping OS today.

the __malloc() debug hooks and the associated CRT debug interfaces are nice for cases where you have the source code to the tests, however they can often miss allocations by standard libraries or other code which is linked. This is expected as they are the Visual Studio heap debugging infrastructure.

gflags is a very comprehensive and detailed set of debuging capabilities which has been included with Windows for many years. Having advanced functionality for source and binary only use cases (as it is the OS heap debugging infrastructure).

It can log full stack traces (repaginating symbolic information in a post-process operation), of all heap users, for all heap modifying entrypoint's, serially if needed. Also, it may modify the heap with pathalogical cases which may align the allocation of data such that the page protection offered by the VM system is optimally assigned (i.e. allocate your requested heap block at the end of a page, so even a singele byte overflow is detected at the time of the overflow.

umdh is a tool which can help assess the status at various checkpoints, however the data is continually accumulated during the execution of the target o it is not a simple checkpointing debug stop in the traditional context. Also, WARNING, Last I checked at least, the total size of the circular buffer which store's the stack information, for each request is somewhat small (64k entries (entries+stack)), so you may need to dump rapidly for heavy heap users. There are other ways to access this data but umdh is fairly simple.

NOTE there are 2 modes;

  1. MODE 1, umdh {-p:Process-id|-pn:ProcessName} [-f:Filename] [-g]
  2. MODE 2, umdh [-d] {File1} [File2] [-f:Filename]

    I do not know what insanity gripped the developer who chose to alternate between -p:foo argument specifier's and naked ordering of argument's but it can get a little confusing.

The debugging sdk works with a number of other tools, memsnap is a tool which apparently focuses on memory leask and such, but I have not used it, your milage may vary.

Execute gflags with no arguments for the UI mode, +arg's and /args are different "modes" of use also.

RandomNickName42