Hi A software named G09 works in parallel using Linda. It spawns its parallel child processes on other nodes (hosts) as
/usr/bin/ssh -x compute-0-127.local -n /usr/local/g09l/g09/linda-exe/l1002.exel ...other_opts...
However, when the master node kills this process, the corresponding child process on other node, namely compute-0-127 does not die but keeps running in background. Right now, I manually go to each node which has these orphaned Linda processes and kill them with kill
. Is there any way to kill such child processes?
Look at pastebin 1 for PSTREE before killing the process and at pastebin 2 for PSTREE after parent is killed
pastebin1 - http://pastebin.com/yNXFR28V
pastebin2 - http:// pastebin.com/ApwXrueh
-not enough reputation points for hyperlinking second pastebin, sorry !(
Update to Answer1
Thanks Martin for explaining. I tried following
killme() { kill 0 ; } ; #Make calls to prepare for running G09 ;
g09 < "$g09inp" > "$g09out" &
trap killme 'TERM'
wait
but when Torque/Maui (which handles job execution) kills the job(this script) as qdel $jobid
the processes started by G09 as ssh -x $host -n
still run in the background. What am I doing wrong here ? (Normal termination is not a problem as G09 itself stops those processes.) Following is pstree
before qdel
bash
|-461.norma.iitb. /opt/torque/mom_priv/jobs/461.norma.iitb.ac.in.SC
| `-g09
| `-l1002.exe 1048576000Pd-C-C-addn-H-MO6-fwd-opt.chk
| `-cLindaLauncher/tmp/viaExecDataN6
| |-l1002.exel 1048576000Pd-C-C-addn-H-MO6-fwd-opt.ch
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | `-{l1002.exel}
| |-ssh -x compute-0-149.local -n ...
| |-ssh -x compute-0-147.local -n ...
| |-ssh -x compute-0-146.local -n ...
| |-{cLindaLauncher}
| `-{cLindaLauncher}
`-pbs_demux
and after qdel
it still shows
461.norma.iitb. /opt/torque/mom_priv/jobs/461.norma.iitb.ac.in.SC
`-ssh -x -n compute-0-149 rm\040-rf\040/state/partition1/trirag09/461
l1002.exel 1048576000Pd-C-C-addn-H-MO6-fwd-opt.ch
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
`-{l1002.exel}
ssh -x compute-0-149.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
ssh -x compute-0-147.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
ssh -x compute-0-146.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
What am I doing wrong here ? is the trap killme 'TERM'
wrong ?