tags:

views:

20110

answers:

5

How to wait in a bash script for several subprocesses spawned from that script to finish and return exit code !=0 when any of the subprocesses ends with code !=0 ?

Simple script:

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?

Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?

+13  A: 

wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in background. Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.

Luca Tettamanti
how can you loop on 'wait', when that makes the script block until that specific process has died?
Alnitak
Weel, since you are going to wait for all the processes it doesn't matter if e.g. you are waiting on the first one while the second has already finished (the 2nd will be picked at the next iteration anyway). It's the same approach that you'd use in C with wait(2).
Luca Tettamanti
Ah, I see - different interpretation :) I read the question as meaning "return exit code 1 _immediately_ when any of subprocesses exit".
Alnitak
one thing, though - doesn't this risk a race condition if you're specifying PIDs, that PID dies, and then another process is spawned with the same PID?
Alnitak
Hum, I interpreted the code in the question as a barrier. As you said, apparently there's no way to wait for "any" child...
Luca Tettamanti
About the race: with wait(2) the PID won't be reused until it has been waited upon (it's a zombie); with bash scripts the doc is not very clear, but it seems (I tried...) that the shell waits for the PID and stores the return value for later use - so the PID may be reused :|
Luca Tettamanti
PID may be reused indeed, but you cannot wait for a process that is not a child of the current process (wait fails in that case).
tkokoszka
You can also use %n to refer to the n:th backgrounded job, and %% to refer to the most recent one.
conny
+1  A: 

I don't believe it's possible with Bash's builtin functionality.

You can get notification when a child exits:

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

However there's no apparent way to get the child's exit status in the signal handler.

Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.

What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.

Alnitak
A: 
Cirno de Bergerac
There's should be any issues with multiple appenders, though return values may be written out of order so you don't known which process returned what...
Luca Tettamanti
You could just send identification info with the statuses. At any rate, OP only wanted to know if *any* of the subprocesses returned with status ≠ 0 without regard to which ones specifically.
Cirno de Bergerac
+2  A: 

http://jeremy.zawodny.com/blog/archives/010717.html :

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
`jobs -p` is giving PIDs of subprocesses that are in execution state. It will skip a process if the process finishes before `jobs -p` is called.So if any of subprocess ends before `jobs -p`, that process's exit status will be lost.
tkokoszka
+1  A: 

Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.

waitall() { # PID...
  ## Wait for children to exit and indicate whether all exited with 0 status.
  local errors=0
  while :; do
    debug "Processes remaining: $*"
    for pid in "$@"; do
      shift
      if kill -0 "$pid" 2>/dev/null; then
        debug "$pid is still alive."
        set -- "$@" "$pid"
      elif wait "$pid"; then
        debug "$pid exited with zero exit status."
      else
        debug "$pid exited with non-zero exit status."
        ((++errors))
      fi
    done
    (("$#" > 0)) || break
    # TODO: how to interrupt this sleep when a child terminates?
    sleep ${WAITALL_DELAY:-1}
   done
  ((errors == 0))
}

debug() { echo "DEBUG: $*" >&2; }

pids=""
for t in 3 5 4; do 
  sleep "$t" &
  pids="$pids $!"
done
waitall $pids
Mark Edgar
One could possibly skip that WAITALL_DELAY or set it very low, as no processes are started inside the loop I don't think it is too expensive.
Marian