views:

227

answers:

3

Hello!

I want to update a large amount of SVN-versioned projects at once, using a script. It takes very long when running update jobs one by one.

So I tried to run the jobs in parallel. It seems to work, however I'm not sure if it's done correctly. Perhaps there are concurrency issues I didn't think of?

Please take a look at the script:

#!/bin/sh

time (
 for f in `ls -d */`
 do
  (
   OUTPUT=`svn update $f`
   echo -e "= = = = = = = = = = $f \n$OUTPUT"
  ) &
 done

 wait
)

When I don't store the output first, it comes all mixed up.

Do you think, it's OK this way?

NOTE: The speed up was really about factor 20 for 40 projects, when there is not lot to update.

A: 

It should be fine. Does your hard disk thrash a lot when doing this kind of concurrent updates? It might if you have a lot of changes to update.

Andrew Keith
I did a sync afterwards, takes 10sec sometimes ;)
ivan_ivanovich_ivanoff
+1  A: 

Yes, it's okay that the output is mixed up during parallel execution unless special precautions (like storing the output in your case) are made.

The console doesn't wait for the whole output of the command to appear. Note, that when you run svn update without anything parallel, the lines are printed one-by-one, as Subversion fetches and merges the files. So when two svn invocations are working at the same time, each of them wants to print one by one, and the output is a mixture of lines printed by them.

Pavel Shved
+5  A: 

Your output will still be mixed up if two jobs happen to complete at the same time. You'd be better off writing the output to files and then cat'ting the files at the end of the run:

#!/bin/sh
outdir="/tmp/output$$" # probably ought to be chosen with e.g. mktemp

trap 'rm -rf "$outdir"' EXIT # Clean up on exit, even if Ctrl-C

time (
    mkdir "$outdir"
    for f in `ls -d */` # You have issues with filenames with space in, here
    do
            (
                    echo -e "= = = = = = = = = = $f"
                    svn update "$f"
            ) > $outdir/"$f" &
    done

    wait
    cat /tmp/output/"$f"
)

Aside from that, my concern would be that you're not limiting the number of jobs you run explicitly, but that's fairly hard to achieve in a shell script. Providing you've confident that you're looking at 40 directories and not, say, 10,000, I don't see it being too much of an issue.

ijw
So, you point is, that two "concurrent" echo commands which print a lot of text could be mixed up by the console?
ivan_ivanovich_ivanoff
Yes. I don't think 'echo' is atomic. That said, a quick test I've just run with 1MB files suggests it might be close enough to atomic that you don't care...
ijw
No, the echo is not atomic. You have N processes all with STDOUT connected to the same file descriptor (the shell script's STDOUT). The example above chnges the file descriptor associated with STDOUT in the subshell before it's launched, so each echo command has its own handle.
dannysauer