views:

1215

answers:

4

Ok, I was running POV-Ray on all the demos, but POV's still single-threaded and wouldn't utilize more than one core. So, I started thinking about a solution in BASH.

I wrote a general function that takes a list of commands and runs them in the designated number of sub-shells. This actually works but I don't like the way it handles accessing the next command in a thread-safe multi-process way:

  • It takes, as an argument, a file with commands (1 per line),
  • To get the "next" command, each process ("thread") will:
    • Waits until it can create a lock file, with: ln $CMDFILE $LOCKFILE
    • Read the command from the file,
    • Modifies $CMDFILE by removing the first line,
    • Removes the $LOCKFILE.

Is there a cleaner way to do this? I couldn't get the sub-shells to read a single line from a FIFO correctly.


Incidentally, the point of this is to enhance what I can do on a BASH command line, and not to find non-bash solutions. I tend to perform a lot of complicated tasks from the command line and want another tool in the toolbox.

Meanwhile, here's the function that handles getting the next line from the file. As you can see, it modifies an on-disk file each time it reads/removes a line. That's what seems hackish, but I'm not coming up with anything better, since FIFO's didn't work w/o setvbuf() in bash.

#
# Get/remove the first line from FILE, using LOCK as a semaphore (with
# short sleep for collisions).  Returns the text on standard output,
# returns zero on success, non-zero when file is empty.
#
parallel__nextLine() 
{
  local line rest file=$1 lock=$2

  # Wait for lock...
  until ln "${file}" "${lock}" 2>/dev/null
  do sleep 1
     [ -s "${file}" ] || return $?
  done

  # Open, read one "line" save "rest" back to the file:
  exec 3<"$file"
  read line <&3 ; rest=$(cat<&3)
  exec 3<&-

  # After last line, make sure file is empty:
  ( [ -z "$rest" ] || echo "$rest" ) > "${file}"

  # Remove lock and 'return' the line read:
  rm -f "${lock}"
  [ -n "$line" ] && echo "$line"
}
+1  A: 

If you need real thread safety, I would recommend to migrate to a better scripting system.

With python, for example, you can create real threads with safe synchronization using semaphores/queues.

rossoft
A: 

I believe you're actually forking processes here, and not threading. I would recommend looking for threading support in a different scripting language like perl, python, or ruby.

dlamblin
Indeed, it does spawn sub-shells. I thought it was clear in the text that I meant the term "thread" conceptually; I did some edits to make that clearer.
NVRAM
Process forking is fairly cheap in Unix-based systems, and separate processes are often referred to as "threads", inaccurately. In any case, separate processes are often better than threads, since they're more self-contained.
David Thornley
I agree that forked processes may actually be the better approach than threads.
dlamblin
But having separate process memory precludes sharing a variable as could be done w/threads. Hence, the use of files...
NVRAM
+3  A: 
#adjust these as required
args_per_proc=1 #1 is fine for long running tasks
procs_in_parallel=4

xargs -n$args_per_proc -P$procs_in_parallel povray < list

Note the nproc command coming soon to coreutils will auto determine the number of available processing units which can then be passed to -P

pixelbeat
My question really is about programming in bash, not just getting the jobs to run in parallel. But this is good info - thanks!
NVRAM
BTW, **grep -c ^processor /proc/cpuinfo** seems to work fine on Linux if **/proc** is mounted, although I don't have a lot of machines on which to test it.
NVRAM
Yes that will list the number of online CPUs in the system. The number available to a process may be smaller though due to a previous taskset for example.
pixelbeat
I'd misread your "coming soon" as if it applied to **xargs**. I'll accept this even if it isn't quite what I expected, and doesn't allow shell built-in or script functions.
NVRAM
A: 

sorry to bump this after so long, but I pieced together a fairly good solution for this IMO
It doesnt work perfectly, but it will limit the script to a certain number of child tasks running, and then wait for all the rest at the end.

#!/bin/bash

pids=()
thread() {
  local this
  while [ ${#} -gt 6 ]; do
    this=${1}
    wait "$this"
    shift
  done
  pids=($1 $2 $3 $4 $5 $6)
}
for i in 1 2 3 4 5 6 7 8 9 10
do
  sleep 5 &
  pids=( ${pids[@]-} $(echo $!) )
  thread ${pids[@]}
done
for pid in ${pids[@]}
do
  wait "$pid"
done

it seems to work great for what I'm doing (handling parallel uploading of a bunch of files at once) and keeps it from breaking my server, while still making sure all the files get uploaded before it finishes the script

KageUrufu