What to do when an out-of-memory error occurs?

views:

160

answers:

What to do when an out-of-memory error occurs?

Possible Duplicate:
What's the graceful way of handling out of memory situations in C/C++?

Hi,

this seems to be a simple question a first glance. And I don't want to start a huge discussion on what-is-the-best-way-to-do-this....

Context: Windows >= 5, 32 bit, C++, Windows SDK / Win32 API

But after asking a similar question, I read some MSDN and about the Win32 memory management, so now I'm even more confused on what to do if an allocation fails, let's say the C++ new operator.

So I'm very interested now in how you implement (and implicitly, if you do implement) an error handling for OOM in your applications.
If, where (main function?), for which operations (allocations) , and how you handle an OOM error.

(I don't really mean that subjectively, turning this into a question of preference, I just like to see different approaches that account for different conditions or fit different situations. So feel free to offer answers for GUI apps, services - user-mode stuff ....)

Some exemplary reactions to OOM to show what I mean:

GUI app: Message box, exit process
non-GUI app: Log error, exit process
service: try to recover, e.g. kill the thread that raised an exception, but continue execution
critical app: try again until an allocation succeeds (reducing the requested amount of memory)
hands from OOM, let STL / boost / OS handle it

Thank you for your answers!

The best-explained way will receive the great honour of being an accepted answer :D - even if it only consists of a MessageBox line, but explains why evering else was useless, wrong or unneccessary.

Edit: I appreciate your answers so far, but I'm missing a bit of an actual answer; what I mean is most of you say don't mind OOM since you can't do anything when there's no memory left (system hangs / poor performance). But does that mean to avoid any error handling for OOM? Or only do a simple try-catch in the main showing a MessageBox?

+3 A:

You do the exact same thing you do when:

you created 10,000 windows
you allocated 10,000 handles
you created 2,000 threads
you exceeded your quota of kernel pool memory
you filled up the hard disk to capacity.

You send your customer a very humble message where you apologize for writing such crappy code and promise a delivery date for the bug fix. Any else is not nearly good enough. How you want to be notified about it is up to you.

Hans Passant 2010-09-04 23:53:21

It's not necessarily a bug. If you state you need 1 gigabyte of memory for your game, and the user has only 512 megabytes available, it's the user's fault for not having enough memory to comply with the minimum requirements you have set up. This disregards unreasonable use of that 1 gigabyte, memory leaks, and the like.

strager 2010-09-04 23:57:59

@strager: look up "virtual memory".

Hans Passant 2010-09-05 00:03:42

You certainly have a point. But does that mean your opinion is to not catch an OOM error? My aim is to write good code, and I think that includes that I'm wondering to what extend error handling should be included.

DyP 2010-09-05 00:04:23

It is essentially an asynchronous error, it can strike anywhere, very hard to recover from. Your user is no happier, she still can't finish her task. Put your effort in fixing the design, not the band-aid.

Hans Passant 2010-09-05 00:10:27

Just to make sure I understand what you exactly mean: If an arbitrary OOM error occurs (caused by an OOM condition of the system), don't bother catching it. (This is different from the answer to: what to do if you try to allocate a large amount of memory and fail)

DyP 2010-09-05 00:37:50

OOM is never arbitrary, not on Windows. The machine would have to run out of disk space first. The user knows that, Windows sets off the 140 dB alarm before that happens. Let it die.

Hans Passant 2010-09-05 00:50:27

I meant that from the point of view of your process. When another process allocates a large amount of memory, there may be an OOM in your process you would not expect. In this case, it's not your fault, but you cannot do much (exit process, maybe msgbox). If your process raised that exception, it's either bad design (as you say) or an unexpected amount of memory required (where you could notify the user that operation was not possible). Besides, I did some small tests... and the system hung (was not operable) w/o alarm when the max size of the page file was approached.

DyP 2010-09-05 01:02:05

Yes, the machine will grind to a halt eventually. Your user will know why, she started them. There's little point in reminding her she did something dumb, she realized that half an hour ago. And rebooted the machine 15 minutes ago.

Hans Passant 2010-09-05 01:14:04

+3 A:

On most modern OSes, OOM will occur long after the system has become completely unusable, since before actually running out, the virtual memory system will start paging physical RAM out to make room for allocating additional virtual memory and in all likelihood the hard disk will begin to thrash like crazy as pages have to be swapped in and out at higher and higher frequencies.

In short, you have much more serious concerns to deal with before you go anywhere near OOM conditions.

Side note: At the moment, the above statement isn't as true as it used to be, since 32-bit machines with loads of physical RAM can exhaust their address space before they start to page. But this is still not common and is only temporary, as 64-bit ramps up and approaches mainstream adoption.

Edit: It seems that 64-bit is already mainstream. While perusing the Dell web site, I couldn't find a single 32-bit system on offer.

Marcelo Cantos 2010-09-04 23:54:51

funny - I originally wanted to include the info of your side note to my question. really. but then i thought this was part of the answer: if the virtual address range is full, you can still use the stack, MessageBoxes and so on, so handling / reporting this error is easy but different from other OOM errors.

DyP 2010-09-05 00:01:26

@DyP: yes, but my main point is that your program will get into deep trouble when you exhaust *physical* memory and this usually happens long before you get hit with `std::bad_alloc`, which only occurs when you exhaust *virtual* memory. Even though these two events may happen closer together under certain conditions (e.g., a 32-bit system with lots of physical RAM), you really should focus on avoiding the more common case of physical memory exhaustion, which will automatically resolve the OOM case.

Marcelo Cantos 2010-09-05 00:10:22

side note of mine: and I don't get support for my 64 bit OS on my Dell o.O of course, the main reason for 64 bit OS is it supports more than 3 (4) GB of RAM. I think most modern PCs reach that limit easily.

DyP 2010-09-05 00:10:36

@Marcelo: I looked up the debug implementation of the new op in MSVC++ 9, which results in a call to HeapAlloc and throws a bad_alloc when it returns 0. Afaik, HeapAlloc also returns 0 if there's no physical memory left.

DyP 2010-09-05 00:20:39

This isn't quite true. In my experience, most OOM errors occur when there's plenty of memory available in the virtual address space for the process. The problem is usually that the heap has become fragmented and it can't grow any more. There really aren't any good solutions to that (except to prevent the heap from fragmenting in the first place).

Larry Osterman 2010-09-05 02:12:47

@DyP: AFAIK, HeapAlloc is tied to the availability of virtual memory, not physical. Very few APIs deal with physical memory.

Marcelo Cantos 2010-09-05 02:32:28

@Larry: You are correct, but missing my point. Memory allocation success/failure is tied to the availability of *virtual*, not *physical*, memory. Whether virtual memory is exhausted due to the sheer volume of memory used or because of heavy fragmentation, the problem is roughly the same. A heavily fragment heap is still likely to cause enormous paging pressure if in-use objects are scattered over the address space. Pathogens like repeated `realloc(size *= 2)` are bugs that should be dumped on the debugger's lap, not "managed" by some recovery routine. Either way, `bad_alloc` doesn't help.

Marcelo Cantos 2010-09-05 02:33:38

@Marcelo: I think the terms "virtual memory" and "physical memory" are misleading (at least, for me). Citing the MSDN: http://msdn.microsoft.com/en-us/library/aa366711(v=VS.85).aspx 5th paragraph, HeapAlloc does commit memory pages and therefore returns 0 => bad_alloc (for the new op, or a STATUS_NO_MEMORY exception) if there is no physical memory (I mean no RAM and no page file space) left. Unfortunately, this isn't well-documented on the HeapAlloc page itself.

DyP 2010-09-05 16:15:17

@Marcelo: Actually I'm not. Physical memory fragmentation happens all the time and isn't relevant to apps. I'm discussing virtual memory fragmentation - if the heap is filled with small discontiguous blocks and it can't allocate a new chunk of memory to hold more blocks, it will fail due to OOM.

Larry Osterman 2010-09-05 17:06:20

@DyP: I have never heard of the pagefile being referred to as "physical memory". In fact, the term "physical memory" is generally used to disambiguate between those chips that you push into the slots on the motherboard and the spinning disks that catch the overflow. (Would it be clearer if I said, "physical RAM"?) Conflating the two completely misses the thrust of my answer, which is that you should be much more concerned with avoiding touching those spinning disks than with what happens when the spinning disks fill up.

Marcelo Cantos 2010-09-06 09:30:51

@Larry: I was referring to virtual memory fragmentation too, so I'm afraid I'm a little lost as to what point you are trying to make. AFAICT, you seem to be referring to precisely the kinds of buggy scenarios my `realloc` example exemplifies. Please forgive me if, after accusing you of missing my point, I've gone and missed your point.

Marcelo Cantos 2010-09-06 09:40:45

@Marcelo: k, my problem arised from a 15 year old Jeffrey Richter book on Win32 development where he lists the differences between MS-DOS, 16 bit and 32 bit Windows ^^ In this context, he uses the term "physical memory" as opposite to "virtual memory" (meaning "virtual address range") on several occasions. His conclusion on paging is that a memory page is loaded from page file into RAM only when a thread tries to access it (access violation -> OS SEH), not when it gains a time slice. But I don't think that's the whole truth (otherwise the system would not hang as easily when page file is full)

DyP 2010-09-06 14:07:10

In my case, what happens when you have an app that fragments the memory up so much it cannot allocate the contiguous block needed to process the huge amount of nodes?

Well, I split the processing up as much as I could.

For OOM, you can do the same thing, chop your processes up into as many pieces as possible and do them sequentially.

Of course, for handling the error until you get to fix it (if you can!), you typically let it crash. Then you determine that those memory allocs are failing (like you never expected) and put a error message direct to the user along the lines of "oh dear, its all gone wrong. log a call with the support dept". In all cases, you inform the user however you like. Though, its established practice to use whatever mechanism the app currently uses - if it writes to a log file, do that, if it displays an error dialog, do the same, if it uses the Windows 'send info to microsoft' dialog, go right ahead and let that be the bearer of bad tidings - users are expecting it, so don't try to be clever and do something else.

gbjbaanb 2010-09-04 23:59:41

It depends on your app, your skill level, and your time. If it needs to be running 24/7 then obviously you must handle it. It depends on the situation. Perhaps it may be possible to try a slower algorithm but one that requires less heap. Maybe you can add functionality so that if OOM does occur your app is capable of cleaning itself up, and so you can try again.

So I think the answer is 'ALL OF THE ABOVE!', apart from LET IT CRASH. You take pride in your work, right?

Don't fall into the 'there's loads of memory so it probably won't happen' trap. If every app writer took that attitude you'd see OOM far more often, and not all apps are running on a desktop machines, take a mobile phone for example, it's highly likely for you to run into OOM on a RAM starved platform like that, trust me!

If all else fails display a useful message (assuming there's enough memory for a MessageBox!)

James 2010-09-05 00:04:51

it seems a MessageBox doen't even needs further memory, see http://stackoverflow.com/questions/3533429/c-windows-how-to-report-an-out-of-memory-exception-bad-allocbut if the OOM is because of a physical memory exhaustion, you may not even be able to call anything (if it would require another stack page to be committed, for example)

DyP 2010-09-05 00:22:56

Ok fine, but what would you do if another app is leaking handles? It's all the same issue... dealing with a lack of resources. We used use PGPDisk in work and this program had a handle leak. After a few days our machines would become unusable. We were stuck, if the program wasn't running we couldn't access the encrypted drives!All you can do is to make sure your code is as robust as possible: check error codes! OOM can also turn trivial exception throws into aborts.

James 2010-09-05 01:25:57

Actually there are scenarios where letting the app crash is the best solution (this is a recovery strategy known as "fail fast"). If there's an external monitor (like IIS or the Windows service controller), you can have the external monitor restart your application when it exits unexpectedly.

Larry Osterman 2010-09-05 02:14:13

I find that hard to accept, Larry

James 2010-09-05 02:36:54

+1 A:

Basically, you should do whatever you can to avoid having the user lose important data. If disk space is available, you might write out recovery files. If you want to be super helpful, you might allocate recovery files while your program is open, to ensure that they will be available in case of emergency.

supercat 2010-09-05 00:25:12

+1. If you do have an app that needs to save results or settings, this is a reason to try to catch all errors and backup data. Besides from running a backup thread in the background, it may be worth introducing an error handling for OOM for these kind of apps (which have large data sets that may change in short time). Special case, but useful answer.

DyP 2010-09-05 00:54:26

+1 A:

Simply display a message or dialog box (depending on whether your in a terminal or window system), saying "Error: Out of memory", possibly with debugging info, and include an option for your user to file a bug report, or a web link to where they can do that.

If your really out of memory then, in all honesty, there's no point doing anything other than gracefully exiting, trying to handle the error is useless as there is nothing you can do.

Joe D 2010-09-05 00:32:51

+1 for clearness and simpleness (and for avoiding sarcasm ^^)

DyP 2010-09-05 00:42:35

ansaurus

tags:

views:

answers:

What to do when an out-of-memory error occurs?

related questions