views:

13

answers:

0

Am running a rails application which launches a series of relatively long running scripts using delayed_job and Rinda to farm out the work. At the moment am using a single delayed_job process (rake jobs:work, as am stuck on Windows), a ring server process (started with RingyDingy), and a single service process ( pretty ordinary )

These are running in the same machine (though we have plans to actually have a ‘farm’ of services on more than one machine).

When we start a run of more than a few scripts ( longest run is about 48 scripts, run time up to 7 hours total) we occasionally see a script fail because it can’t find the ring server (RingNotFound) error even though the ring server process is running fine. The next script almost always finds the server and runs ok.

Anyone have any ideas?

Code follows

worker excerpt: (where the error occurs)

  def run_distributed 
   output = 'urk' 
   @aFullScript.distrib = true 

   DRb.start_service

   ring_finger = Rinda::RingFinger.new('127.0.0.1')
   sleep 2
   ring_server = ring_finger.lookup_ring_any
   sleep 2
   log_message(1, "ring server:\n#{ring_server.inspect}", __LINE__)

   service = ring_server.take([:name, :ScriptServer2, nil, nil])
   log_message(1, "service:\n#{service.inspect}", __LINE__)

   server = service[2]
   server.fullScript = @aFullScript
   log_message(1, "server:\n#{server.inspect}", __LINE__)

   begin
    output = server.run
    ring_server.write([:name, service[1], service[2], service[3]])
   rescue
    log_message(3, "In ScriptWorker2.perform #{$!} \n#{@aFullScript.to_yaml}", __LINE__)
   ensure
    return output
   end
  end

ring_server.rb:

  Dir.chdir('../vendor/gems/RingyDingy-1.2.1/lib')
  require 'rubygems'
  require 'ringy_dingy/ring_server'

  puts "ring server - waftt #{$$}"
  rs = RingyDingy::RingServer.new(:Verbose => true)
  rs.run

script_server2.rb:

  require 'rubygems'
  require 'rinda/ring'
  require 'drb'

  class ScriptServer2
   include DRbUndumped

   attr_accessor :server_output
   attr_accessor :fullScript

   def initialize
    @fullScript = ''
    @server_output = ''
   end

   def run
    @server_output = ''
    puts "****** Running #{@fullScript.myName} #{Time.now.strftime("%Y%m%d %H%M%S")} (#{@fullScript})"
    @server_output << @fullScript.doit
  #    puts @server_output
    puts "****** Completed #{@fullScript.myName} #{Time.now.strftime("%Y%m%d %H%M%S")} (#{@fullScript})"
    @server_output
   end
  end

  DRb.start_service #( nil, ScriptServer2.new )
  myPid = Process.pid.to_s
  puts "ScriptServer2 #{myPid}"

  finger = Rinda::RingFinger.new('127.0.0.1')
  ring_server = finger.lookup_ring_any
  ring_server.write([:name,
            :ScriptServer2,
            ScriptServer2.new,
            "ScriptServer2 #{myPid}"],
           Rinda::SimpleRenewer.new)

  liveTS  = Time.now.strftime("%Y%m%d%H%M%S")
  puts "going live... #{liveTS}"
  DRb.thread.join

The command file that starts everything up:

  start "rails server - waftt" /i /min ruby script/server -p 9191 start
  sleep 6
  start "ring server - waftt" /i /min ruby script/ring_server.rb
  sleep 10
  start "script server 1 - waftt" /i /min ruby lib/script_server2.rb
  start "rake jobs:work 1 - waftt" /i /min rake jobs:work

Thanks! pat