views:

114

answers:

2

I have a Linux daemon that forks a few children and monitors them for crashes (restarting as needed). It will be great if the parent could monitor the memory usage of child processes - to detect memory leaks and restart child processes when the go beyond a certain size. How can I do this?

+3  A: 

You should be able to get detailed memory information out of /proc/{PID}/status:

Name:   bash
State:  S (sleeping)
Tgid:   6053
Pid:    6053
PPid:   6050
TracerPid:  0
Uid:    1007    1007    1007    1007
Gid:    1007    1007    1007    1007
FDSize: 256
Groups: 1007 
VmPeak:    48076 kB
VmSize:    48044 kB
VmLck:         0 kB
VmHWM:      4932 kB
VmRSS:      2812 kB
VmData:     2232 kB
VmStk:        84 kB
VmExe:       832 kB
VmLib:      6468 kB
VmPTE:       108 kB
Threads:    1
SigQ:   0/8190
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001010
SigCgt: 0000000188020001
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:   0f
Mems_allowed:   00000000,00000001
voluntary_ctxt_switches:    69227121
nonvoluntary_ctxt_switches: 19071

However, unless memory leaks are dramatic, it's difficult to detect them looking at process statistics, because malloc and free are usually quite abstract from system calls (brk/sbrk) to which they correspond.

You can also check into /proc/${PID}/statm.

WhirlWind
there are no system calls to do that? parsing files seems like a pretty dirty way to get the information.
Omry
It's how ps and friends get their information...
WhirlWind
so I guess that's the best way. thanks!
Omry
A: 

you could try having a monitor script running vmstat in parallel with your process (note this is not a good idea if you're running this script multiple times as you'll get multiple vmstat copies). Then this monitor script can take the free memory plus the buffer and cache size to get the amount of memory that the OS has available and you can track that. Then if that gets below some threshold you can check for the biggest processes by calling ps -e -o... (see man page for details but try vsz,pcpu,user,pid,args as a starting point).

I'd advise running this monitor as a separate process and having it kill the rogue process when it gets too large. You could restrict the set of processes monitored by using the

-u user-name

parameter to ps.

This is all a hack (UK meaning) though - the right solution though is to fix the leaks, assuming you have the code.

Nick
I prefer an integrated solution that does not rely on external programs/scripts.of course fixing the memory leak is the right thing to do, but in the real world you sometimes have to compromise temporarily. also, I can envision cases when you run external code that is not under your control (think apache running a php script).
Omry
The problem with a single, integrated solution is that it gets increasingly complicated. The advantage of having separate programs to do separate functions is that each is individually relatively simple and easy to debug and deploy. An integrated solution seems good at first (no problems with communication, you know it's running 'cause the main program is running etc.) but as your system gets bigger the simplicity issue will get more and more important
Nick