views:

68

answers:

3

I have a list of files URLS where I want to download them:

http://somedomain.com/foo1.gz
http://somedomain.com/foo2.gz
http://somedomain.com/foo3.gz

What I want to do is the following for each file:

  1. Download foo1,2.. in parallel with wget and nohup.
  2. Every time it completes download process them with myscript.sh

What I have is this:

#! /usr/bin/perl

@files = glob("foo*.gz");

foreach $file (@files) {
   my $downurls = "http://somedomain.com/".$file;
   system("nohup wget $file &");
   system("./myscript.sh $file >> output.txt");
}

The problem is that I can't tell the above pipeline when does the file finish downloading. So now it myscript.sh doesn't get executed properly.

What's the right way to achieve this?

+1  A: 

Try combining the commands using &&, so that the 2nd one runs only after the 1st one completes successfully.

system("(nohup wget $file  && ./myscript.sh $file >> output.txt) &");
codaddict
+2  A: 

Why to do this using perl. use bash instead. Below is just a sample.

#!/bin/bash

for file in foo1 foo2 foo3
do
    wget http://samedomain.com/$file.gz .

    if [ -f $file.gz ];
    then
        ./myscript.sh $file.gz >> output.txt
    fi
done
Space
+1  A: 

If you want parallel processing, you can do it yourself with forking, or use a built in module to handle it for you. Try Parallel::ForkManager. You can see a bit more on it's usage in http://stackoverflow.com/questions/2510306, but the CPAN page for the module will have the real useful info. You probably want something like this:

use Parallel::ForkManager;

my $MAX_PROCESSES = 8; # 8 parallel processes max
my $pm = new Parallel::ForkManager($MAX_PROCESSES);

my @files = glob("foo*.gz");

foreach $file (@all_data) {
  # Forks and returns the pid for the child:
  my $pid = $pm->start and next; 

  my $downurls = "http://somedomain.com/".$file;
  system("wget $file");
  system("./myscript.sh $file >> output.txt");

  $pm->finish; # Terminates the child process
}

print "All done!\n";
kbenson
The inner loop is taken from the request's example, I suggest checking the return value of the system call to be sure that the commands executed correctly. You have to right shift by 8 to get the shell exit code. if ((system("command") >> 8) == 0) { ... }
kbenson