views:

680

answers:

5

Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.

I know I can implement this with ps | grep -style tricks, but is there an easier way?

+7  A: 

A small bash script could help you:

# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
    sleep 1
    joblist=($(jobs -p))
done
$* &

If you call:

. exec-async.sh sleep 10

...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.

You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.

The sleep inside is ugly, but I didn't find a way to wait for the first job that terminates.

tangens
A: 

Have you considered starting ten long-running listener processes and communicating with them via named pipes?

Steven Huwig
A: 

you can use ulimit -u see http://ss64.com/bash/ulimit.html

Shay
The only problem with this is it will cause the processes to die rather than block and wait which is the desired behavior.
Benj
This solution is dangerous and hard to control. Since my shell scripts tend to contain a lot of subshell expansion and piping, each line typically needs 4+ processes. When you set the ulimit of the entire process, it not just limits how many jobs can execute, it also limits things necessary for the execution of the rest of the script, causing things to block/fail in an unpredictable way.
amphetamachine
+1  A: 

If you're willing to do this outside of pure bash, you should look into a job queuing system.

For instance, there's GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.

Both systems will require some configuration, but it's entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there's no reason you can't use one of these on a single desktop computer without regard for compute time or memory limits.

Mark Rushakoff
+3  A: 

The following script shows a way to do this with functions. You can either put the bgxupdate and bgxlimit functions in your script or have them in a separate file which is sourced from your script with:

. /path/to/bgx.sh

It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).

It used the bash built-in, jobs, to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit function:

  • set up an empty group variable.
  • transfer that to bgxgrp.
  • call bgxlimit with the limit and command you want to run.
  • transfer the new group back to your group variable.

Of course, if you only have one group, just use bgxgrp directly rather than transferring in and out.

#!/bin/bash

# bgxupdate - update active processes in a group.
#   Works by transferring each process to new group
#   if it is still active.
# in:  bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.

bgxupdate() {
    bgxoldgrp=${bgxgrp}
    bgxgrp=""
    ((bgxcount = 0))
    bgxjobs=" $(jobs -pr | tr '\n' ' ')"
    for bgxpid in ${bgxoldgrp} ; do
        echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
        if [[ $? -eq 0 ]] ; then
            bgxgrp="${bgxgrp} ${bgxpid}"
            ((bgxcount = bgxcount + 1))
        fi
    done
}

# bgxlimit - start a sub-process with a limit.

#   Loops, calling bgxupdate until there is a free
#   slot to run another sub-process. Then runs it
#   an updates the process group.
# in:  $1     - the limit on processes.
# in:  $2+    - the command to run for new process.
# in:  bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes

bgxlimit() {
    bgxmax=$1 ; shift
    bgxupdate
    while [[ ${bgxcount} -ge ${bgxmax} ]] ; do
        sleep 1
        bgxupdate
    done
    if [[ "$1" != "-" ]] ; then
        $* &
        bgxgrp="${bgxgrp} $!"
    fi
}

# Test program, create group and run 6 sleeps with
#   limit of 3.

group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6 ; do
    bgxgrp=${group1} ; bgxlimit 3 sleep ${i}0 ; group1=${bgxgrp}
    echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done

# Wait until all others are finished.

echo
bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]] ; do
    oldcount=${bgxcount}
    while [[ ${oldcount} -eq ${bgxcount} ]] ; do
        sleep 1
        bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp}
    done
    echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done

Here's a sample run:

0 12:38:00 [ ]

1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]
4 12:38:10 [ 5880 2524 1560 ]
5 12:38:20 [ 2524 1560 5032 ]
6 12:38:30 [ 1560 5032 5212 ]

9 12:38:50 [ 5032 5212 ]
9 12:39:10 [ 5212 ]
9 12:39:30 [ ]
  • The whole thing starts at 12:38:00 and, as you can see, the first three processes run immediately.
  • Each process sleeps for n*10 seconds so the fourth process doesn't start until the first exits (at time t=10 or 12:38:10). You can see that process 3368 has disappeared from the list before 1560 is added.
  • Similarly, the fifth process (5032) starts when the second (5880) exits at time t=20.
  • And finally, the sixth process (5212) starts when the third (2524) exits at time t=30.
  • Then the rundown begins, fourth process exits at t=50 (started at 10, duration of 40), fifth at t=70 (started at 20, duration of 50) and sixth at t=90 (started at 30, duration of 60).

Or, in time-line form:

Process:  1  2  3  4  5  6 
--------  -  -  -  -  -  -
12:38:00  ^  ^  ^
12:38:10  v  |  |  ^
12:38:20     v  |  |  ^
12:38:30        v  |  |  ^
12:38:40           |  |  |
12:38:50           v  |  |
12:39:00              |  | 
12:39:10              v  |
12:39:20                 |
12:39:30                 v
paxdiablo
Very nice, thank you!
static_rtti