How should a multi-threaded C application handle a failed malloc()?

views:

261

answers:

+7 Q:

How should a multi-threaded C application handle a failed malloc()?

A part of an application I'm working on is a simple pthread-based server that communicates over a TCP/IP socket. I am writing it in C because it's going to be running in a memory constrained environment. My question is: what should the program do if one of the threads encounters a malloc() that returns NULL? Possibilities I've come up with so far:

No special handling. Let malloc() return NULL and let it be dereferenced so that the whole thing segfaults.
Exit immediately on a failed malloc(), by calling abort() or exit(-1). Assume that the environment will clean everything up.
Jump out of the main event loop and attempt to pthread_join() all the threads, then shut down.

The first option is obviously the easiest, but seems very wrong. The second one also seems wrong since I don't know exactly what will happen. The third option seems tempting except for two issues: first, all of the threads need not be joined back to the main thread under normal circumstances and second, in order to complete the thread execution, most of the remaining threads will have to call malloc() again anyway.

What shall I do?

+3 A:

There's nothing wrong with option 2. You don't have to assume - exit() exits the process, which means all the threads are torn down and everything is cleaned up.

Don't forget to try and log where the failed allocation occured.

caf 2010-05-14 11:45:13

That makes sense, thanks.

ipartola 2010-05-14 12:23:59

"everything" is a bit optimistic. For instance, you have to make sure that files on disk are properly cleaned up.

MSalters 2010-05-14 13:41:04

"everything is cleaned-up" in the sense that resources are returned to the OS; but you better hope that this is not *your* life-support system! The consequences of a system halting non-deterministically may not be that simple and are application dependent.

Clifford 2010-05-14 13:47:54

If my life-support system is a server accepting TCP/IP connections, then I must already be tired of life!

caf 2010-05-14 14:00:15

Agreed, life support was perhaps just an extreme example. There are many other instances however where spontaneous termination of a process would have consequences beyond just the health of the hardware. For example it must have happened to you as to all of us that an application crashing has lost you hours of work? My point was that "its ok, because it will not leak or leave open handles or threads running" is a rather narrow view "cleaned up".

Clifford 2010-05-15 08:16:38

My judgement was that the OPs question was about such a narrow view of "cleaned up" - the OP just seemed to be concerned about leaving system resources allocated. As an aside, trying to save the current application status when `malloc()` has failed is likely to be impossible - a better strategy is for the application to create regular checkpoints to persistent storage while things are still running OK.

caf 2010-05-16 01:05:33

Depends on your architecture I think.

Does the malloc() failing mean that just that thread can't continue or is the entire process borked in that circumstance?

Generally when memory is really tight (i.e. microprocessor environments) it is a good idea to avoid ALL dynamic memory allocation to avoid issues like this.

GrahamS 2010-05-14 12:25:26

It means that the thread cannot continue, but the circumstances are such that each individual thread uses fairly little memory, and I put a limit on the number of threads to be created. malloc returning NULL would mean that something else went wrong with the system and some other process took up the majority of the memory.

ipartola 2010-05-14 13:30:11

+2 A:

There's a fourth option: free some memory (caches are always good candidates) and try again.

If you cannot afford this, I'd choose option 2 (logging or printing some kind of error message, obviously)... The only concern about cleanup would be closing the opened network connections in an orderly manner, so the clients know that the application on the other side is shutting down rather than find an unexpected connectivity problem.

fortran 2010-05-14 12:31:13

Yes, I thought/read about this option. In this case that's not an option, since there are no caches. Do sockets/file descriptors not get destroyed automatically upon exiting the program?

ipartola 2010-05-14 13:25:36

They should be, since they're handled by the kernel. They'll be closed a automatically when the process terminates.

Wyzard 2010-05-14 13:31:22

Yes, they're closed, but maybe your application has some specific protocol to disconnect rather than just shutting down the stream.

fortran 2010-05-18 08:49:34

+3 A:

This is one of the reason that space / rad hard systems generally forbid dynamic memory allocation. When malloc() fails, its extremely hard to 'cure' the failure. You do have some options:

You are not required to use the built in libc malloc() (at all, or as usual). You can wrap malloc() to do extra work on failures, such as notifying something else. This is helpful when using something like a watchdog. You can also use a full blown garbage collector, though I don't recommend it. Its better to identify and fix leaks.
Depending on storage and complexity, infrequently accessed allocated blocks could be mapped to disk. But here, typically, you are only looking at a few KB of savings in physical memory.
You can use a static pool of memory and your own malloc() that won't oversell it. If you have profiled your heap usage extensively (using a tool like Valgrind's massif or similar), you can reasonably size the pool.

However, what most of those suggestions boil down to is not trusting / using the system malloc() if failure is not an option.

In your case, I think the best thing you can do is make sure a watchdog is notified in the event that malloc() fails, so that your process (or the whole system) can be re-started. You don't want it looking 'alive and running' while in deadlock. This could be as simple as just unlinking a file.

Write very detailed logs. What file / line / function did the failure happen?

If malloc() fails when trying to get just a few KB, its a good sign that your process really can't continue reliably anyway. If it fails grabbing a few hundred MB, you may be able to recover and keep going. By that token, whatever action you take should be based on just how much memory you were trying to get, and if calls to allocate a much smaller size still succeed.

The one thing you never want to do is just operate on NULL pointers and let it crash. Its just sloppy, provides no useful logging of where things went wrong and gives the impression that your software is of low / unstable quality.

Tim Post 2010-05-14 13:02:47

Wow, thanks for the detailed answer. I think your suggestion of notifying a watchdog would work well for my case. In this program, I'm only requesting small bits of memory (almost always under 1KB and most commonly under 100 bytes), but it's not known in advance how much of it I will need, so there's nothing I can do to free up memory in case malloc returns NULL.

ipartola 2010-05-14 13:27:56

@ipartola - yes, if you can't allocate 1k, something is seriously wrong. Either way, a reasonably smart watchdog should do the job until you fix every leak / etc and really tailor everything running to cooperate in a small space.

Tim Post 2010-05-14 18:36:38

From personal experience, I can tell that the frequency of malloc failures is often overestimated. For instance, in Linux the usual "solution" is a variant of 2, and you don't get a malloc failure. A process just suddenly dies. On larger systems the application tends to die because a user or watchdog kills it, once the swapping has made it unresponsive.

This makes cleanup a bit harder, and it also makes it hard to come up with a general solution.

MSalters 2010-05-14 13:40:07

I figure it's not going to be common. In cases where I'd need to allocate megabytes at a time, I'd probably have more freedom to free up caches, etc. In this case, I'm allocating very little memory so the only way it can really happen is if another process on the system goes crazy. I am just doing this to do it semi-correctly without cluttering the source code.

ipartola 2010-05-14 21:27:43

Is this running on an OS? The use of pthreads suggests so. Do you know even that malloc() will ever return NULL? On some systems (Linux for example) the fault will occur within malloc() and will be handled by the OS (by killing the process) without malloc() returning.

I would suggest that you allocate a memory pool at initialisation of your application and allocate from that rather than using malloc() after initialisation. This will give you control over the memory allocation algorithm and the behaviour when memory is exhausted. If there is insufficient memory for the pool, there will be a single point of failure at initialisation before your app has had a chance to start anything it cannot finish.

In real-time and embedded systems it is common to used a 'fixed-block memory allocator'. If your OS does not provide services, it can be implemented by pre-allocating memory blocks and placing their pointers on a queue. To allocate a block you take a pointer from the queue, and to release it you place it back on the queue. When the queue is empty, memory is exhausted, and you can either baulk and handle the error, or block and wait until another thread returns some memory. You may want to create multiple pools with different sized blocks, or even create a pool for a specific purpose with blocks the precise size needed for that purpose.

Clifford 2010-05-14 13:44:55

This program will run on a *nix flavored system, but likely with very little RAM and now swap. I wonder how common it is among UNIX programs to allocate all their memory pools ahead of time. It sounds like that might be an overkill for my situation, but I'll definitely keep it in mind.

ipartola 2010-05-14 21:21:51

ansaurus

tags:

views:

answers:

How should a multi-threaded C application handle a failed malloc()?

related questions