tags:

views:

261

answers:

1

I have a long-running process with some child processes that must be restarted if they exit. To handle clean restarts of these child processes, I trap the exit signal with

trap("CLD") do
  cpid = Process.wait
  ... handle cleanup ...
end

The long-running process occasionally needs to invoke 'curl' using a backquote as in

`/usr/bin/curl -m 60 http://localhost/central/expire`

The problem is that the backquote invocation is causing me to get a SIGCHLD and making my trap fire. This then gets stuck in the CLD trap because Process.wait does not finish. If there happen to be no (non-backquote) child processes at that time, the Process.wait instead gives an Errno::ECHILD exception.

I can circumvent this problem by wrapping the backquote call with this line before:

sig_handler = trap("CLD", "IGNORE") #  Ignore child traps

and this line after the backquote invocation:

trap("CLD", sig_handler) # replace the handler

but this means that I may miss a signal from the (non-backquote) child processes during that window, so I'm not really happy with that.

So, is there a better way to do this? (I am using ruby 1.9.1p243 on GNU/Linux 2.6.22.6 if it matters)

Update: The code below illustrates the problem (and my current solution for it). There seems to be some strange timing issue here since I don't always get the ECHILD exception. But just once is enough to mess things up.

#!/usr/bin/env ruby
require 'pp'

trap("CLD") do
  cpid = nil
  begin
    puts "\nIn trap(CLD); about to call Process.wait"
    cpid = Process.wait 
    puts "In trap(CLD); Noting that ssh Child pid #{cpid}: terminated"
    puts "Finished Child termination trap"
  rescue Errno::ECHILD
    puts "Got Errno::ECHILD"
  rescue Exception => excep
    puts "Exception in CLD trap for process [#{cpid}]"
    puts PP.pp(excep, '')
    puts excep.backtrace.join("\n")
  end
end

#Backtick problem shown (we get an ECHILD most of the time)
puts "About to invoke backticked curl"
`/usr/bin/curl -m 6 http://developer.yahooapis.com/TimeService/V1/getTime?appid=YahooDemo`
sleep 2; sleep 2 # Need two sleeps because the 1st gets terminated early by the trap
puts "Backticked curl returns"

# Using spawn
puts "About to invoke curl using spawn"
cpid = spawn("/usr/bin/curl -m 6 http://developer.yahooapis.com/TimeService/V1/getTime?appid=YahooDemo")
puts "spawned child pid is #{cpid} at #{Time.now}"
+1  A: 

Start monitored subprocesses from a subprocess

Just start your tracked and monitored children from a child of your main process that never exits. That way it won't notice the backtick children exiting...

And if you do this, you could avoid the use of SIGCHLD entirely, as you could just use a loop with a wait in it to notice children exit events.

Other ideas:

  • ignore one SIGCHLD every time you execute a backtick command. ISTM that you might ignore a "real" SIGCHLD by accident this way, but that won't matter, because you would then get a "spurious" one that you would process.
DigitalRoss
Thanks, but to give further detail, the main process is in a polling loop that communicates using those child processes to other machines. The sub-processes are actually ssh clients that make tunnels through which drb communication is performed. So the main process is not just sitting around waiting to restart failed children (although it must do so when that happens)...
Mike Berrow
... However, I think that an inversion of your idea may be promising. If I launch the backtick invocation of curl from a forked sub process, I should be able to prevent the main (parent) process from receiving a SIGCHLD from the backtick completion.
Mike Berrow
I think you will still get SIGCHLD in that case.
DigitalRoss
You are right but the forked process seems to send a SIGCHLD with the effect that the Process.wait in my CLD trap does not block (if I have other children) or return ECHILD (if I do not have other children). Using backtick I either stick on Process.wait of get an Errno::ECHILD exception.I also found that using spawn instead of backtick solves the problem the same way.I will show example code in an an answer to this.
Mike Berrow