views:

323

answers:

3

I had recently a problem with oom-killer starting to kill processes after some time. I could see that the memory was consumed, but by the time I got to the server it wasn't clear anymore what consumed it. Is there a good non-obvious place to get more info about oom-killer? E.g. detailed info about processes at the time of activation, detailed info about killed processes and reasons for the choice?

I'm looking for a specific place to find this information, specific tool to gather it or some configuration to improve oom-killer reporting. I'm not looking for generic info about oom-killer. /var/messages by default will only contain a detailed report on the free/allocated memory, but not specific processes it was allocated to.

+1  A: 

Typically you should get a message in /var/log/messages, with quite a large amount of detail relating to the process that was killed by the oom-killer.

BigMikeD
Not really, for some reason I see only memory info.
Jevgeni Kabanov
+2  A: 

You can check the messages log file to see which process got killed and some related information. As for the reasons:

... the ideal candidate for liquidation is a recently started, non privileged process which together with it's children uses lots of memory, has been nice'd, and does no raw I/O. Something like a nohup'd parallel kernel build (which is not a bad choice since all results are saved to disk and very little work is lost when a 'make' is terminated).

From here.

You can define some processes to be immune to the killer, adjust the swappiness parameter in case you have it too low (which makes the killer trigger happy) and check for things listed here

Vinko Vrsalovic
+1  A: 

This is not the exact answer to your question, but the malloc(3) man page on Linux has some information on how to turn off memory overcommit (echo 2 > /proc/sys/vm/overcommit_memory)

Thorsten79