views:

452

answers:

4

Hi, I'm writing a little script, that will create archives in main thread and after each archive is complete, a new thread would be created by calling function that would take care of uploading these archives. The reason I want uploading to be done in background is so that another archive could be created while the previous archives are being uploaded.

The problem I'm having is at the very end of the script. That is, main thread don't wait for all uploading threads to finish before exiting. Look at the following simplified script (I removed/changed parts of the code not related to the problem)

function func {
for files in /home/somewhere/
  do
    echo "Uploading $1" &
  done
wait
}

find /home/some/path -type f | while read filename ; do
  echo "Creating archive of $filename"
  func $somevariable &
done

wait

Everything is executing very nicely until the last archive is created, then the script ends before all func threads finish, leaving many files not uploaded.

Thank you for your ideas.

A: 

You could loop until the jobs command returns nothing as an alternative method.

Philluminati
+2  A: 

Update: good points in the comment.

So, on a second look, I think the problem is the subshell that is created by the | to the loop. It's a good way to structure the script but you need to do the final wait in the shell that spun off the background tasks. Do something like this:

find /home/some/path -type f | (while read filename ; do
    echo "Creating archive of $filename"
    func $somevariable &
  done
  wait
)
DigitalRoss
Andrew
Thanks again for update :) I *think* that did the trick. I'll do some more tests.
Andrew
+1  A: 

If you execute wait with no arguments, it is supposed to wait for currently active child processes to complete.

The problem is likely to be that "all currently active child processes" does not mean what you think it means in this context. In particular, if you create pipelines in a subshell it is not entirely clear if they would be waited for in the parent shell.

I suspect that wait actually only waits for processes / pipelines that show up in the output of jobs. Try some experiments ...

A possible alternative may be to capture the child process ids and do a wait n call for each id.

Stephen C
A: 

Tricky! The problem is that this block

find /home/some/path -type f | while read filename ; do
  ...
done

Creates a subshell. The func $somevariable jobs are created in that subshell. The parent shell sees that all the background jobs it created have finished, it doesn't keep track of background jobs created by subshells it spawned.

The easiest fix is to create your background jobs from the parent shell instead. You can avoid creating a subshell by not using a pipe:

while read filename ; do
  ...
done < <(find /home/some/path -type f)

Well, that creates a subshell---for the find---but the while block is no longer in a subshell.

profjim