tags:

views:

810

answers:

9

I'm writing an caching app that consumes large amounts of memory.

Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

EDIT2: I've just realised that NULL will throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.

+10  A: 

Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.

The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.

Stack unwinding is your friend ;)

EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.

EDIT2: Example:

#include <iostream>
#include <vector>

void DoStuff()
{
    std::vector<int> data;
    //insert a whole crapload of stuff into data here.
    //Assume std::vector::push_back does the actual throwing
    //i.e. data.resize(SOME_LARGE_VALUE_HERE);
}

int main()
{
    try
    {
        DoStuff();
        return 0;
    }
    catch (const std::bad_alloc& ex)
    {   //Observe that the local variable `data` no longer exists here.
        std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
                     "get a bigger hard disk) for that calculation!";
        return -1;
    }
}

EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.

Billy ONeal
If you are going to downvote an answer, please... *please* comment as to why you downvoted. I get tired of not being able to improve the answer because people won't tell me what they don't like.
Billy ONeal
there is no guarantee bad_alloc will be thrown. malloc impl can just mmap anonymous (in fact it does), and attempt to write to the allocated area will cause segv, not exception.
stepancheg
@stepancheg: If you are calling `malloc`, yes, you will have to handle the case where it returns `NULL`. However, Standard C++ requires that `std::bad_alloc` be thrown when a `new` allocation cannot be satisfied.
Billy ONeal
@billy there is no difference between c and c++: segv will be sent not on malloc/new, but when you access allocated area.
stepancheg
@stepancheg: Any system that does that does not conform to either the C or C++ standards. According to both standards, `malloc` must return `NULL` on failure. If your system does not do that, then that system cannot claim to be using a standards-compliant C or C++ implementation. Of course, if `malloc` returns `NULL` and you try to dereference the pointer, you end up in undefined behavior land (which on POSIX systems will be `SIGSEGV`, and on Windows systems will be `EXCEPTION_ACCESS_VIOLATION`.
Billy ONeal
@stepancheg is right that such C and C++ implementations exist. They do not conform to the standards, but they are common enough that they need to be dealt with. Sometimes there is no graceful way out, such as when the first sign of memory exhaustion is being killed by the OOM killer.
Philip Potter
@Philip: Well, I don't think it's reasonable to ask a C++ question when we are talking about a platform which is not C++ (because a non-standards-conforming implementation is *not* C++). More to the point, the OP tagged the question with Linux, which 99.9% of the time is `gcc` or `icc`, in which case the standard is followed.
Billy ONeal
@billy standard linux malloc behaves as i described.
stepancheg
@Billy: isn't linux notorious for its OOM killer? the implementation isn't just the compiler, it's the library and the host environment. Normally I agree that we shouldn't pander to nonstandard implementations, but out of memory is so rarely held according to the standard that we have no choice.
Philip Potter
@stepancheg: Err.. last I checked, Linux (and pretty much every other POSIX system) used Doug Lea's `malloc`, which does not behave in this way.
Billy ONeal
Related: [To New, Perchance To Throw, Part 2](http://www.gotw.ca/publications/mill16.htm).
James McNellis
@Philip: Okay, upon further reading, it looks like Linux does follow this behavior -- but only when any possible backing store for memory has been exhausted. The much more common out of memory problem is when a process runs out of address space. And this solution handles that kind of failure just fine.
Billy ONeal
@billy i checked myself on my netbook with ubuntu 10.04#include <stdio.h>#include <stdlib.h>int main() { printf("%d\n", malloc(700 * 1024 * 1024) != NULL); return 0;}this didn't fail while top shown 350 mb free + 50 mb buffers.Formatting lost, but i think you got the idea.
stepancheg
@stepancheg: You are muddying the complete pictures. Linux *can* follow an "optimistic" memory allocation scheme, in which it will allocate to a process memory which it might not have at the time of allocation. Allocations *can still be denied*, and either return `NULL`, in the case of `malloc`, or throw an exception, in the case of C++'s `new`, *exactly as the standard says*. If an optimistic allocation can't be fulfilled later, Linux will terminate the process via the OOM killer. I highly doubt that it does this by sending a SIGSEGV - I would expect something much more severe.
Thanatos
...so, [citation needed]. This whole behavior can be turned off, via a flag in `/proc`
Thanatos
@stepancheg: So you don't have 700 MB of memory between physical and swap space? Just because memory is not committed until it is actually used does not mean the system won't be able to support that. Remember that memory to your C or C++ program has very little to do with physical memory.
Billy ONeal
@billy i have total 1 gb of ram and no swap. and moreover, i modifed the program, so it succesfully allocated (without free) 4 times 700 mb (in loop). it is definitey more memory than the system has.for (i = 0; i < 100; ++i) printf("%d\n", malloc(700 * 1024 * 1024) != NULL);
stepancheg
@Thanatos seem like you are right, oom killer just kills a process, not sends a signal
stepancheg
+1 because I agree with the comments on how to handle OOM. The issue of how non-conforming linux is, I consider orthogonal. On such platforms, an app which is going to deliberately use as much memory as possible should investigate any signals it receives or callbacks it can register for, for the OS to hint that it should free memory. For instance Qtopia has a series of "help! I'm running out of memory!" states, essentially with increasing numbers of exclamation marks as memory gets lower.
Steve Jessop
... and I'd add that even on a platform which behaves as Billy prefers, an app which deliberately uses as much memory as possible (or, equivalently, which has a steady memory leak) is already all kinds of anti-social. On a desktop OS it will likely be quit or forcibly killed eventually by one means or another, even if it's just the user trying to prevent the OS grinding to a halt or keeling over sideways. In a more restricted embedded environment you do sometimes find that everything works gracefully despite allocation failures.
Steve Jessop
@Steve: +1 to comment. That said, whenever I've seen these kinds of memory situations it's happened where a system which was not badly designed was just used to do something huge. For example, several hundred MB Excel workbooks. Sure, Excel is built right, but when you start using it to run calculations that take hours on a fast machine, it uses a ton of RAM. (Okay, you shouldn't use Excel for these kinds of calculations, but many businesses *do* use it for these kinds of jobs. Having the OS randomly terminate Excel would ruin their whole day)
Billy ONeal
Maybe, but when linux terminates an app, it terminates one which is using a lot of memory, which almost always means it could just as easily have had a failed allocation itself (maybe it allocated *all* its memory up front and never calls any system functions, but otherwise it could have). I doubt that huge Excel job would go any better if OOM was correctly reported than if it just died - either way your job doesn't get done. It's normally just a question of whether the app gets to find out (and report) *why* it failed. Hence annoying to those who check return values, as civilised beings do.
Steve Jessop
It'd be nice if your code example printed to std::cerr and returned a non-zero value in the case of bad_alloc.
@ja-cop: Done. (in 15 chars)
Billy ONeal
+3  A: 

I don't have any specific experience on Linux, but I spent a lot of time working in video games on games consoles, where running out of memory is verboten, and on Windows-based tools.

On a modern OS, you're most likely to run out of address space. Running out of memory, as such, is basically impossible. So just allocate a large buffer, or buffers, on startup, in order to hold all the data you'll ever need, whilst leaving a small amount for the OS. Writing random junk to these regions would probably be a good idea in order to force the OS to actually assign the memory to your process. If your process survives this attempt to use every byte it's asked for, there's some kind of backing now reserved for all of this stuff, so now you're golden.

Write/steal your own memory manager, and direct it to allocate from these buffers. Then use it, consistently, in your app, or take advantage of gcc's --wrap option to forward calls from malloc and friends appropriately. If you use any libraries that can't be directed to call into your memory manager, junk them, because they'll just get in your way. Lack of overridable memory management calls is evidence of deeper-seated issues; you're better of without this particular component. (Note: even if you're using --wrap, trust me, this is still evidence of a problem! Life is too short to use those libraries that don't let you overload their memory management!)

Once you run out of memory, OK, you're screwed, but you've still got that space you left free before, so if freeing up some of the memory you've asked for is too difficult you can (with care) call system calls to write a message to the system log and then terminate, or whatever. Just make sure to avoid calls to the C library, because they'll probably try to allocate some memory when you least expect it -- programmers who work with systems that have virtualised address spaces are notorious for this kind of thing -- and that's the very thing that has caused the problem in the first place.

This approach might sound like a pain in the arse. Well... it is. But it's straightforward, and it's worth putting in a bit of effort for that. I think there's a Kernighan-and/or-Ritche quote about this.

brone
BTW I have done SOME coding on Linux, but it was working on a program that never allocated more than 10MB of long-running buffers, none of which was more than 720K. So I didn't look into the issue of running out of address space; chance of that was close enough to zero for my taste.
brone
@brone: If the heap becomes fragmented (which happens quite quickly when all of your allocations are small), you can run out of address space well before you run out of backing store. Most systems nowadays have more than 4GB of RAM between Physical RAM and Swap Space.
Billy ONeal
Indeed - as stated in the answer. So grab your memory up front, manage the addresses yourself, and leave a bit free for the system. If you can't avoid hitting the memory limit, you can at least ensure that when you do there are addresses that are definitely not being used by your program, so that the system can work to at least some extent. (You could allocate a buffer that's freed when you start to run out, but if you've got multiple threads now you have to stop them, etc. - just doing your own memory management is much safer.)
brone
+10  A: 

Doesn't this question make assumptions regarding overcommitted memory?

I.e., an out of memory situation might not be recoverable! Even if you have no memory left, calls to malloc and other allocators may still succeed until the program attempts to use the memory. Then, BAM!, some process gets killed by the kernel in order to satisfy memory load.

Arafangion
If there is no memory left, a call to malloc is **required** to return a `NULL` pointer according to both the C and C++ standards.
Billy ONeal
@Daniel: Please look up "OOM killer". It's the thing which kills low-priority processes in low (virtual) memory conditions. Not all OSes have them, but some very common ones do.
Philip Potter
@Philip: How about that? Thanks for pointing this out, because I had no idea.
Daniel Trebbien
@Billy @Daniel Actually it may, so the downvotes are undeserved. At least in a lot of Linux distros memory over-commit is on by default. When memory over-commit is on, the kernel will always return success on all allocations and will only try to actually reserve memory when a process tries to write it. If there is none available, the infamous OOM killer is invoked and a random process dies (according to its OOM score). Which I think is seriously broken, but a lot of people (inc. Linus) defend it and say that handling OOM gracefully in user apps is a waste of time.
Alex B
@Alex: I think "may" is overstating it a bit - that's what they do, but as far as I can tell this is not permitted by the standard. But there are lots of things that compilers and OSes do that break the standard - as long as they have a mode which is compliant, then they're legit. Over-commit can be disabled if you want an environment which is compliant, although that requires an impractical amount of communication between the programmer and the user. In practice you can ignore Linus and handle OOM properly, but must accept the OS can kill your app at any time for any reason, including OOM.
Steve Jessop
@Alex B: I did not downvote this answer. For the record, I agree with you that this is seriously broken behavior.
Billy ONeal
@Billy (first comment): malloc will return NULL if it fails to allocate address space. But the availability of address space and the availability of memory are very different things.
Ben Voigt
@Ben: That does seem to be what Linux does. That does not mean that what Linux is doing conforms to standard.
Billy ONeal
@Arafangion: If you modify your answer to indicate that the behavior you describe is not standard behavior; rather it is behavior that is nonstandard and only appears on some specific operating systems, I will upvote your answer.
Billy ONeal
Where does "the standard" (I assume this means C99) require this? It refers only to "space" being allocated, not memory, so it seems like it could be interpreted as "address space".
Ben Voigt
@Ben: I believe that behavior runs afowl of *The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated(until the space is explicitly deallocated)* . Even with that liberal interpretation of `malloc` 's behavior, the failure of your program as a result of that behavior is not standards compliant.
Billy ONeal
@Billy: Then the standard requires an uninterruptible power supply as well as perfectly error-free memory. In reality, memory can become inaccessible without being explicitly deallocated for a variety of reasons beyond the control of the C standard library.
Ben Voigt
Linux does that for performance reasons and also due to the simple fact that bad applications are pretty common nowadays: they allocate more memory than they actually use. Thus allocation of physical RAM happens only during first access. Side-effect of the implementation is that it is blazingly fast but can't check for errors. Advised practice for applications which need and can handle memory errors (after the telecom traditions) is to preallocate all required memory (as per configuration) in advance.
Dummy00001
@Ben: While that is true, there's nothing else in the C standard that has to do with physical concerns of the hardware. Any system which in purpose is not following the rules of the standard is not a standard compliant platform, no matter how you want to look at it. @Dummy00001: How can you preallocate and check for errors if there's no way to know whether allocation succeeds?
Billy ONeal
Not exactly the answer I was looking for. i.e. a question to a question. But thanks for that, it's useful information and something to ponder over. I've accepted the other answer that also got 10 upvotes... it's probably all I can expect for now.
Matt H
+2  A: 

If your application is likely to allocate large blocks of memory and risks hitting the per-process or VM limits, waiting until an allocation actually fails is a difficult situation from which to recover. By the time malloc returns NULL or new throws std::bad_alloc, things may be too far gone to reliably recover. Depending on your recovery strategy, many operations may still require heap allocations themselves, so you have to be extremely careful on which routines you can rely.

Another strategy you may wish to consider is to query the OS and monitor the available memory, proactively managing your allocations. This way you can avoid allocating a large block if you know it is likely to fail, and will thus have a better chance of recovery.

Also, depending on your memory usage patterns, using a custom allocator may give you better results than the standard built-in malloc. For example, certain allocation patterns can actually lead to memory fragmentation over time, so even though you have free memory, the available blocks in the heap arena may not have an available block of the right size. A good example of this is Firefox, which switched to dmalloc and saw a great increase in memory efficiency.

gavinb
Agreed - rather than trying to keep using memory until there's none left, you should provide a configuration option which sets the maximum amount of memory your process will use for caching. When you go to cache a new object, check that you won't exceed the memory limit - if you will, throw out some old objects.In other words, punt the whole issue to the system administrator.
caf
Thanks, this is actually what I was intending to do, but then I thought this might be a good stackoverflow question
Matt H
@gavinb: `"A good example of this is Firefox, which switched to dmalloc and saw a great increase in memory efficiency."` Do you know where we can read more about this?
Lazer
A: 

As has been stated, exhausting memory means that all bets are off. IMHO the best method of handling this situation is to fail gracefully (as opposed to simply crashing!). Your cache could allocate a reasonable amount of memory on instantiation. The size of this memory would equate to an amount that, when freed, will allow the program to terminate reasonably. When your cache detects that memory is becoming low then it should release this memory and instigate a graceful shutdown.

Christopher Hunt
1) it is always nice to give an explanation why an answer is downvoted 2) @Chris, flagging your answer because you are unhappy that you were downvoted is an abuse of the moderation system. If you have questions or wish to rage against the machine, so to speak, please go to meta.stackoverflow.com, where you will be status-declined.
Will
Sorry, I didn't realise that it was an abuse of the moderation system and, also, I did not know about meta.stackoverflow.com at the time. Please realise though that I was not so much unhappy, I simply wanted to understand why I had been downvoted so that I could improve the answer in the spirit of improving SO's content. Thanks for the response.
Christopher Hunt
A: 

It's possible for writes to the syslog to fail in low memory conditions: there's no way to know that for every platform without looking at the source for the relevant functions. They could need dynamic memory to format strings that are passed in, for instance.

Long before you run out of memory, however, you'll start paging stuff to disk. And when that happens, you can forget any performance advantages from caching.

Personally, I'm convinced by the design behind Varnish: the operating system offers services to solve a lot of the relevant problems, and it makes sense to use those services (minor editing):

So what happens with Squid's elaborate memory management is that it gets into fights with the kernel's elaborate memory management ...

Squid creates a HTTP object in RAM and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without Squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive. ...

After some time, Squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So Squid goes out, creates a file and then it writes the http objects to the file.

Here we switch to the high-speed camera: Squid calls write(2), the address it gives is a "virtual address" and the kernel has it marked as "not at home". ...

The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used Squid object, write it to the paging ... space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed. ...

So now Squid has the object in a page in RAM and written to the disk two places: one copy in the operating system's paging space and one copy in the filesystem. ...

Here is how Varnish does it:

Varnish allocate some virtual memory, it tells the operating system to back this memory with space from a disk file. When it needs to send the object to a client, it simply refers to that piece of virtual memory and leaves the rest to the kernel.

If/when the kernel decides it needs to use RAM for something else, the page will get written to the backing file and the RAM page reused elsewhere.

When Varnish next time refers to the virtual memory, the operating system will find a RAM page, possibly freeing one, and read the contents in from the backing file.

And that's it. Varnish doesn't really try to control what is cached in RAM and what is not, the kernel has code and hardware support to do a good job at that, and it does a good job.

You may not need to write caching code at all.

Max Lybbert
This project has nothing to do with HTTP caching or use any current protocols for which servers already exist. So yes I very well do need to do caching code.
Matt H
The fact that Varnish happens to be an HTTP server is irrelevant. Seriously. The point is that application caches often end up fighting with operating system services like virtual memory. A big cache can get swapped to disk and cause your application to run noticeably slower. Instead, just allocate a pool of memory and leave it up to the operating system to cache commonly used objects.
Max Lybbert
+1  A: 

I don't think that capturing the failure of malloc or new will gain you much in your situation. linux allocates large chunks of virtual pages in malloc by means of mmap. By this you may find yourself in a situation where you allocate much more virtual memory than you have (real + swap).

The program then will only fail much later with a segfault (SIGSEGV) when you write to the first page for which there isn't any place in swap. In theory you could test for such situations by writing a signal handler and then dirtying all pages that you allocate.

But usually this will not help much either, since your application will be in a very bad state long before that: constantly swapping, computing mechanically with your harddisk...

Jens Gustedt
This behaviour is known as overcommit, and is optional (controllable). If you turn it off, malloc will fail if there is strictly not enough virtual memory.
MarkR
Swapping definitely is a problem. When everything is running 6 or 7 orders of magnitude slower than normal, even the most perfect recovery logic becomes nearly useless.
Ben Voigt
A: 

I'm writing an caching app that consumes large amounts of memory. Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If you are writing deamon which should run 24/7/365, then you should not use dynamic memory management: preallocate all the memory in advance and manage it using some slab allocator/memory pool mechanism. That will also protect you again the heap fragmentation.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

Should not. This is partially reason why syslog exists as a syscall: that application can report an error independent of its internal state.

If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

I generally try in the situations to properly handle the error condition, applying the general error handling rules. If error happens during initialization - terminate with error, probably configuration error. If error happens during request processing - fail the request with out-of-memory error.

For plain heap memory, malloc() returning 0 generally means:

  • that you have exhausted the heap and unless your application free some memory, further malloc()s wouldn't succeed.

  • the wrong allocation size: it is quite common coding error to mix signed and unsigned types when calculating block size. If the size ends up mistakenly negative, passed to malloc() where size_t is expected, it becomes very large number.

So in some sense it is also not wrong to abort() to produce the core file which can be analyzed later to see why the malloc() returned 0. Though I prefer to (1) include the attempted allocation size in the error message and (2) try to proceed further. If application would crash due to other memory problem down the road (*), it would produce core file anyway.

(*) From my experience of making software with dynamic memory management resilient to malloc() errors I see that often malloc() returns 0 not reliably. First attempts returning 0 are followed by a successful malloc() returning valid pointer. But first access to the pointed memory would crash the application. This is my experience on both Linux and HP-UX - and I have seen similar pattern on Solaris 10 too. The behavior isn't unique to Linux. To my knowledge the only way to make an application 100% resilient to memory problems is to preallocate all memory in advance. And that is mandatory for mission critical, safety, life support and carrier grade applications - they are not allowed dynamic memory management past initialization phase.

Dummy00001
A: 

I don't know why many of the sensible answers are voted down. In most server environments, running out of memory means that you have a leak somewhere, and that it makes little sense to 'free some memory and try to go on'. The nature of C++ and especially the standard library is that it requires allocations all the time. If you are lucky, you might be able to free some memory and execute a clean shutdown, or at least emit a warning.

It is however far more likely that you won't be able to do a thing, unless the allocation that failed was a huge one, and there is still memory available for 'normal' things.

Dan Bernstein is one of the very few guys I know that can implement server software that operates in memory constrained situations.

For most of the rest of us, we should probably design our software that it leaves things in a useful state when it bails out because of an out of memory error.

Unless you are some kind of brain surgeon, there isn't a lot else to do.

Also, very often you won't even get an std::bad_alloc or something like that, you'll just get a pointer in return to your malloc/new, and only die when you actually try to touch all of that memory. This can be prevented by turning off overcommit in the operating system, but still.

Don't count on being able to deal with the SIGSEGV when you touch memory that the kernel hoped you wouldn't be.. I'm not quite sure how this works on the windows side of things, but I bet they do overcommit too.

All in all, this is not one of C++'s strong spots.

bert hubert
Running out of memory doesn't require a memory leak. It might simply mean that you have allocated more memory than address space allows. Or maybe you've reached a user limit.
Matt H