Hi, I wish to know what are the methods exist to check the Health of a process. Considering that on a system 10000 process are running and you have to make sure that in case any of these process goes down we need to make the process UP.
Use the Process ID (PID) and poll whether the process is still alive or is dead periodically; and if it's dead, then revive it.
However, if you have 10000 process, you will probably hit the OS's process limit first. I suggest redesigning your program so you don't need that much processes in the first place.
Re-spawning processes that go down is usually handled by having specific launcher programs to exec() the program and wait for a SIGCHILD to indicate the child process ended.
For boot time applications (servers etc) daemons like upstart can do this for you automatically.
While others are pointing out that applications already exists (which you really should use unless you have a clear reason not to) I'll throw out a random idea for a custom solution.
If you control all N
processes then make them all have one shared memory area N
bits large (so, 10000 processes ~ 1KB, not bad). When starting each process give it a number, i
, ranging from 0 to N. Every T
seconds have each process will set bit i
in the shared memory to 1. A monitoring process can check that all N
bits are 1 every k*T
seconds, resetting them all to 0 in the process.
This is still O(n), which you won't avoid, but the primitives are all really fast and should scale fine up to the OS thread limit.
An alternate idea for obtaining i
would be just to use the PID, but then the shared memory will have to be larger (probably will still be OK though; for example, the Linux PID range is small).