I've read the question a few times, and I think I sort of get what you
are trying to do. You have a control script. This script spawns
children to do some stuff, and these children spawn the grandchildren
to actually do the work. The problem is that the grandchildren can be
too slow (waiting for STDIN, or whatever), and you want to kill them.
Furthermore, if there is one slow grandchild, you want the entire
child to die (killing the other grandchildren, if possible).
So, I tried implementing this two ways. The first was to make the
parent spawn a child in a new UNIX session, set a timer for a few
seconds, and kill the entire child session when the timer went off.
This made the parent responsible for both the child and the
grandchildren. It also didn't work right.
The next strategy was to make the parent spawn the child, and then
make the child responsible for managing the grandchildren. It would
set a timer for each grandchild, and kill it if the process hadn't
exited by expiration time. This works great, so here is the code.
We'll use EV to manage the children and timers, and AnyEvent for the
API. (You can try another AnyEvent event loop, like Event or POE.
But I know that EV correctly handles the condition where a child exits
before you tell the loop to monitor it, which eliminates annoying race
conditions that other loops are vulnerable to.)
#!/usr/bin/env perl
use strict;
use warnings;
use feature ':5.10';
use AnyEvent;
use EV; # you need EV for the best child-handling abilities
We need to keep track of the child watchers:
# active child watchers
my %children;
Then we need to write a function to start the children. The things
the parent spawns are called children, and the things the children
spawn are called jobs.
sub start_child($$@) {
my ($on_success, $on_error, @jobs) = @_;
The arguments are a callback to be called when the child completes
successfully (meaning its jobs were also a success), a callback when
the child did not complete successfully, and then a list of coderef
jobs to run.
In this function, we need to fork. In the parent, we setup a child
watcher to monitor the child:
if(my $pid = fork){ # parent
# monitor the child process, inform our callback of error or success
say "$$: Starting child process $pid";
$children{$pid} = AnyEvent->child( pid => $pid, cb => sub {
my ($pid, $status) = @_;
delete $children{$pid};
say "$$: Child $pid exited with status $status";
if($status == 0){
$on_success->($pid);
}
else {
$on_error->($pid);
}
});
}
In the child, we actually run the jobs. This involves a little bit of
setup, though.
First, we forget the parent's child watchers, because it doesn't make
sense for the child to be informed of its siblings exiting. (Fork is
fun, because you inherit all of the parent's state, even when that
makes no sense at all.)
else { # child
# kill the inherited child watchers
%children = ();
my %timers;
We also need to know when all the jobs are done, and whether or not
they were all a success. We use a counting conditional variable to
determine when everything has exited. We increment on startup, and
decrement on exit, and when the count is 0, we know everything's done.
I also keep a boolean around to indicate error state. If a process
exits with a non-zero status, error goes to 1. Otherwise, it stays 0.
You might want to keep more state than this :)
# then start the kids
my $done = AnyEvent->condvar;
my $error = 0;
$done->begin;
(We also start the count at 1 so that if there are 0 jobs, our process
still exits.)
Now we need to fork for each job, and run the job. In the parent, we
do a few things. We increment the condvar. We set a timer to kill
the child if it's too slow. And we setup a child watcher, so we can
be informed of the job's exit status.
for my $job (@jobs) {
if(my $pid = fork){
say "[c] $$: starting job $job in $pid";
$done->begin;
# this is the timer that will kill the slow children
$timers{$pid} = AnyEvent->timer( after => 3, interval => 0, cb => sub {
delete $timers{$pid};
say "[c] $$: Killing $pid: too slow";
kill 9, $pid;
});
# this monitors the children and cancels the timer if
# it exits soon enough
$children{$pid} = AnyEvent->child( pid => $pid, cb => sub {
my ($pid, $status) = @_;
delete $timers{$pid};
delete $children{$pid};
say "[c] [j] $$: job $pid exited with status $status";
$error ||= ($status != 0);
$done->end;
});
}
Using the timer is a little bit easier than alarm, since it carries
state with it. Each timer knows which process to kill, and it's easy
to cancel the timer when the process exits successfully -- we just
delete it from the hash.
That's the parent (of the child). The child (of the child; or the
job) is really simple:
else {
# run kid
$job->();
exit 0; # just in case
}
You could also close stdin here, if you wanted to.
Now, after all the processes have been spawned, we wait for them to
all exit by waiting on the condvar. The event loop will monior the
children and timers, and do the right thing for us:
} # this is the end of the for @jobs loop
$done->end;
# block until all children have exited
$done->recv;
Then, when all the children have exited, we can do whatever cleanup
work we want, like:
if($error){
say "[c] $$: One of your children died.";
exit 1;
}
else {
say "[c] $$: All jobs completed successfully.";
exit 0;
}
} # end of "else { # child"
} # end of start_child
OK, so that's the child and grandchild/job. Now we just need to write
the parent, which is a lot easier.
Like the child, we are going to use a counting condvar to wait for our
children.
# main program
my $all_done = AnyEvent->condvar;
We need some jobs to do. Here's one that is always successful, and
one that will be successful if you press return, but will fail if you
just let it be killed by the timer:
my $good_grandchild = sub {
exit 0;
};
my $bad_grandchild = sub {
my $line = <STDIN>;
exit 0;
};
So then we just need to start the child jobs. If you remember way
back to the top of start_child
, it takes two callbacks, an error
callback, and a success callback. We'll set those up; the error
callback will print "not ok" and decrement the condvar, and the
success callback will print "ok" and do the same. Very simple.
my $ok = sub { $all_done->end; say "$$: $_[0] ok" };
my $nok = sub { $all_done->end; say "$$: $_[0] not ok" };
Then we can start a bunch of children with even more grandchildren
jobs:
say "starting...";
$all_done->begin for 1..4;
start_child $ok, $nok, ($good_grandchild, $good_grandchild, $good_grandchild);
start_child $ok, $nok, ($good_grandchild, $good_grandchild, $bad_grandchild);
start_child $ok, $nok, ($bad_grandchild, $bad_grandchild, $bad_grandchild);
start_child $ok, $nok, ($good_grandchild, $good_grandchild, $good_grandchild, $good_grandchild);
Two of those will timeout, and two will succeed. If you press enter
while they're running, though, then they might all succeed.
Anyway, once those have started, we just need to wait for them to
finish:
$all_done->recv;
say "...done";
exit 0;
And that's the program.
One thing that we aren't doing that Parallel::ForkManager does is
"rate limiting" our forks so that only n
children are running at a
time. This is pretty easy to manually implement, though:
use Coro;
use AnyEvent::Subprocess; # better abstraction than manually
# forking and making watchers
use Coro::Semaphore;
my $job = AnyEvent::Subprocess->new(
on_completion => sub {}, # replace later
code => sub { the child process };
)
my $rate_limit = Coro::Semaphore->new(3); # 3 procs at a time
my @coros = map { async {
my $guard = $rate_limit->guard;
$job->clone( on_completion => Coro::rouse_cb )->run($_);
Coro::rouse_wait;
}} ({ args => 'for first job' }, { args => 'for second job' }, ... );
# this waits for all jobs to complete
my @results = map { $_->join } @coros;
The advantage here is that you can do other things while your children
are running -- just spawn more threads with async
before you do the
blocking join. You also have a lot more control over the children
with AnyEvent::Subprocess -- you can run the child in a Pty and feed
it stdin (like with Expect), and you can capture its stdin and stdout
and stderr, or you can ignore those things, or whatever. You get to
decide, not some module author that's trying to make things "simple".
Anyway, hope this helps.