views:

1286

answers:

11

I've devoted a large number of lines of C code to cleanup-labels/conditionals for failed memory allocation (indicated by the alloc family returning NULL). I was taught that this was a good practice so that, on memory failure, an appropriate error status could be flagged and the caller could potentially perform "graceful memory cleanup" and retry. I now have some doubts about this philosophy that I'm hoping to clear up.

I guess it's possible that a caller could deallocate excessive buffer space or strip relational objects of their data, but I find the the caller rarely has the capability (or is at the appropriate level of abstraction) to do so. Also, early-returning from the called function without side effects is often non-trivial.

I also just discovered the Linux OOM killer, which seems to make these efforts totally pointless on my primary development platform.

By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. This is a really bad bug. In case it turns out that the system is out of memory, one or more processes will be killed by the infamous OOM killer.

I figure there are probably other platforms out there that follow the same principle. Is there something pragmatic that makes checking for OOM conditions worthwhile?

+1  A: 

You have to weigh up which is better or worse for you: putting all the work into checking for OOM or having your program fail at unexpected times

PiedPiper
This point seems moot, seeing as how it will still fail unexpectedly on the Linux platform, as I mention in the question.
cdleary
See my response as to the Linux issue.
Artelius
+7  A: 

Regardless of the platform (except maybe embedded systems) it's a good idea to check for NULL and then just exit without doing any (or much) cleanup by hand.

Out of memory isn't a simple error. It is a catastrophe on todays systems.

The book The Practice of Programming (Brian W. Kernighan and Rob Pike, 1999) defines functions like emalloc() which just exits with an error message if there's no memory left.

stesch
+1 This solution also seems applicable to embedded systems - you'd just have to customize `emalloc` for behaviors relevant to your platform. For example, an infinite loop instead of an `exit` call if you have no operating system on a microcontroller.
cdleary
A: 

Yes I believe it is, if you follow the practice consistently. This may be impractical for a large program written in C because of the degree of manual labour this may require, but in a more modern language most of this work is done for you because an out of memory condition results in a thrown exception.

The benefits of doing this consistently are that the program will not enter an undefined state due to the out of memory condition resulting in a buffer overrun (this obviously leaves the possibility of an undefined state due to an early exit to the function, although this is a different class of bug). Having done so, your program can consistently handle the error condition, or if the failure was a critical one, decide to quit in a graceful manner.

1800 INFORMATION
+3  A: 

With today's computers and the amount of RAM typically installed, checking everywhere for memory allocation errors is probably too detailed. As you've seen, it is often difficult or impossible to make a rational decision about what to deallocate. As your process is allocating more and more memory, the OS will correspondingly be reducing the amount of memory available for disk buffers. When that falls below some threshold, then the OS will start paging memory to disk. (This is a simplification, as there are many factors in memory management.)

Once the OS starts paging memory, the whole system gets progressively slower and slower, and it will probably be quite a while before your application ever actually sees a NULL from malloc (if at all).

With the sheer amount of memory available on today's systems, an "out of memory" error more likely means that a bug in your code tried to allocate an arbitrary amount of memory. In that case no amount of freeing and retrying on the part of your process is going to fix the problem.

Greg Hewgill
Does your answer take the possibility of embedded platforms into account? I'd like to think that C libraries I write could be migrated to embedded environments if there were a need.
cdleary
@cdleary: No, you're right it doesn't. I should have mentioned that; the embedded world is very different and different rules apply. Some critical systems will even have a rule of "no allocations after startup" or other strict controls on allocation. There may be no concept of "letting the application crash" because the application IS the only thing around. The embedded programmer must consider doing something sensible in this case.
Greg Hewgill
+3  A: 

I suggest an experiment - write a small program which keeps allocating memory without freeing it and then prints a small (fixed) message when allocation fails. What effects do you notice on your system when you run this program? Does the message ever get printed?

If the system behaves normally and remains responsive up to the point when the error is displayed, then I would say yes, it is worth checking for. OTOH, if the system, becomes slow, unresponsive and eventaually unusable before the message is displayed (if it ever is) then I I would say no, it is not worth checking for.

Important: Before running this test, save all important work. Do not run it on a production server.

Regardfing the Linux OOM behaviur - this is actually desirable and is the way that most OSs work. It's important to realise that when you malloc() some memory you are NOT getting it directly from the OS, you are getting it from the C runtime library. This will typically have asked the OS for a big chunk of memory up front (or at the first request) which it then manages via the malloc/free interface. As many programs never use dynamiic memory at all, it would be undesirable for the OS to hand "real" memory to the C runtime - instead it hands som euncomitted vM which will actually be comitted as you make your malloc calls.

anon
I think it's worth checking anyway: it's better to get an error message rather than a core dump.
Lars Wirzenius
You may very well not get either
anon
+12  A: 

Out of memory conditions can happen even on modern computers with lots of memory, if the user or system administrator restricts (see ulimit) the memory space for a process, or the operating system supports memory allocation limits per user. In pathological cases, fragmentation makes this fairly likely, even.

However, since use of dynamically allocated memory is prevalent in modern programs, for good reasons, it becomes very hairy to handle out-of-memory errors. Checking and handling errors of this kind would have to be done everywhere, at high cost of complexity.

I find that it is better to design the program so that it can crash at any time. For example, make sure data the user has created gets saved on disk all the time, even if the user does not explicitly save it. (See vi -r, for example.) This way, you can create a function to allocate memory that terminates the program if there is an error. Since your application is designed to handle crashes at any time, it's OK to crash. The user will be surprised, but won't lose (much) work.

The never-failing allocation function might be something like this (untested, uncompiled code, for demonstration purposes only):

/* Callback function so application can do some emergency saving if it wants to. */
static void (*safe_malloc_callback)(int error_number, size_t requested);

void safe_malloc_set_callback(void (*callback)(int, size_t))
{
    safe_malloc_callback = callback;
}

void *safe_malloc(size_t n)
{
    void *p;

    if (n == 0)
        n = 1; /* malloc(0) is not well defined. */
    p = malloc(n);
    if (p == NULL) {
        if (safe_malloc_callback)
            safe_malloc_callback(errno, n);
        exit(EXIT_FAILURE);
    }
    return p;
}

Valerie Aurora's article Crash-only software might be illuminating.

Lars Wirzenius
Just nitpicking here: "n == 1;" is a no-op, and there's a syntax error on line 17.
dreamlax
A: 

Checking for OOM conditions and taking appropriate actions may be hard, if you misdesign software. Whether you actually need to check against such situations depends on the reliability of software you want to get.

E.g. VirtualBox hypervisor will detect out-of-memory errors and gracefully pause virtual machine, allowing user to close some applications to free memory. I observed such behavior under Windows. Actually almost all calls in VirtualBox have success indicator as return value and you can just return VERR_NO_MEMORY to denote that memory allocation failed. This introduces some additional checks, but in this case it is worth it.

dragonfly
One of the issues with this is that **the post-OOM procedure can't allocate any memory.** Providing indication to the user without using additional memory in process is often non-trivial or impossible.
cdleary
Of course, but this is actually solvable since post-OOM procedure will probably be able to allocate small amount of memory. Returning to the example, hypervisor allocates memory for guest OS in big chunks. Therefore, failing to allocate a chunk will not actually consume that large chunk. Also there are other technics to archive the goal.
dragonfly
+6  A: 

Look at the other side of the question: if you malloc memory, it fails, and you don't detect it at the malloc, when will you detect it?

Obviously, when you attempt to dereference the pointer.

How will you detect it? By getting a Bus error or something similar, somewhere after the malloc that you'll have to track down with a core dump and the debugger.

On the other hand, you can write

  #define OOM 42 /* just some number */

  /* ... */

  if((ptr=malloc(size))==NULL){
      /* a well-behaved fprintf should NOT malloc, so it can be used
       * in this sort of context
       */
      fprintf(stderr,"OOM at %s: %s\n", __FILE__, __LINE__);
      exit(OOM);
   }

and get "OOM at parser.c:447".

You pick.

Update

Good question about graceful return. The difficulty with assuring a graceful return is that in general you really can't set up a paradigm or pattern of how you do that, especially in C, which is after all a fancy assembly language. In a garbage-collected environment, you could force a GC; in a language with exceptions, you can throw an exception and unwind things. In C you have to do it yourself and so you have to decide how much effort you want to put into it.

In most programs, abnormally terminating is about the best you can do. In this scheme you (hopefully) get a useful message on stderr -- of course it could also be to a logger or something like that -- and a known value as the return code.

HIgh reliability programs with short recovery times push you into something like recovery blocks, where you write code that attempts to get a system back into a survivable state. These are great, but complicated; the paper I linked to talks about them in detail.

In the middle, you can come up with a more complicated memory management scheme, say managing your own pool of dynamic memory -- after all, if someone else can write malloc, so can you.

But there's just no general pattern (of which I'm aware anyway) for cleaning up enough to be able to return reliably and let the surrounding program continue.

Charlie Martin
And you can use a macro and/or wrapper function to make it easier.
Artelius
Just a note: although the title only mentions 'detection,' the question also asks about the necessity for graceful return to the caller. I'd be interested in your perspective on that as well - I understand your point that calling a malloc macro isn't any more difficult than calling malloc itself. :-)
cdleary
The difficulty of graceful return is that, as a general statement, you can't. This is gonna take more than a comment.
Charlie Martin
+6  A: 

It depends on what you're writing. Is it a general-purpose library? If so, you want to deal with a lack of memory as gracefully as possible, particularly if it's reasonable to expect that it will be used on el-cheapo systems or embedded devices.

Consider this: a programmer is using your library. There is a bug (uninitialised variable perhaps) in his program that passes a silly argument to your code, which consequently tries to allocate a single 3.6GB block of memory. Obviously malloc() returns NULL. Would he rather an unexplained segfault generated somewhere in the library code, or a return value to indicate the error?

To avoid having error checks all over your code, one approach is to allocate a reasonable amont of memory at the start, and sub-allocate it as required.

In regards to the Linux OOM killer, I heard that this behaviour is now disabled by default on major distros. Even if it's enabled, don't get the wrong idea: malloc() can return NULL, and it certainly will if your program's total memory use would surpass 4GiB (on a 32-bit system). In other words, even if malloc() doesn't actually secure you some RAM/swap space, it will reserves part of your address space.

Artelius
+1  A: 

Processes are usually run with a resource limit (see ulimit (3)) on the stack size, but not on the heap size. malloc (3) will manage the memory increase of its heap area page-by-page from the operating system, and the operating system will arrange for this page to somehow get allocated physically and correspond to your heap for your process. If there is no more RAM in your computer then most operating systems has something like a swap partition on disc. When your system starts to need to use swap then things gradually get slow. If one process leads to this, it may easily be identified with some utility like ps (1).

Unless your code is to run with a resource limit or on a system with a poor size of memory and no swap, i think one may program with the assumption that malloc (3) succeeds. If you're not certain, just make a dummy wrapper that may someday do the check and simply exit. An error-status return value does not make sense, as your program requires the memory it already has allocated. If your malloc (3) fails and you don't check for NULL, your process will die anyway when it starts accessing the (NULL) pointer it got.

Problems with malloc (3) does in most cases not arise from out-of-memory, but from a logical error in your program that leads to misbehaved calls to malloc and free. This usual problem will not get detected by checking malloc success.

hept
+1  A: 

Well. All depends on situation.

First of all. If you have detected an memory is unsufficient for your need - what will you do? The most common usage is:

if (ptr == NULL) {
    fprintf(log /* stderr or anything */, "Cannot allocate memory");
    exit(2);
}

Well. Even if it does not use malloc it may allocate the buffers. Additionaly too bad if it is a GUI application - your user is unlikely to spot it. If your user is 'smart enought' to run application from console to check errors he will probably see that something ate his whole memory. Ok. So may be display a dialog? But displaying dialog may ate resources - and it usually will.

Secondly - why do you need the information about OOM? It happens in two cases:

  1. Other software is buggy. You cannot do anything with it
  2. Your program is buggy. In such case it is eighter GUI program in which you are unlikely to notify user in any way (not mentioning that 99% of users does not read the messages and will say that software crashed without further details). If it is not the user is likely to spot it anyway (opserving system monitors or using more specialized software).
  3. To free some caches etc. You should check in the system however be warned that it will likely not work. You can handle only own sbrk/mmap/etc. calls and in Linux you will get OOM anyway
Maciej Piechotka