Am running a rails application which launches a series of relatively long running scripts using delayed_job and Rinda to farm out the work. At the moment am using a single delayed_job process (rake jobs:work, as am stuck on Windows), a ring server process (started with RingyDingy), and a single service process ( pretty ordinary )
These are running in the same machine (though we have plans to actually have a ‘farm’ of services on more than one machine).
When we start a run of more than a few scripts ( longest run is about 48 scripts, run time up to 7 hours total) we occasionally see a script fail because it can’t find the ring server (RingNotFound) error even though the ring server process is running fine. The next script almost always finds the server and runs ok.
Anyone have any ideas?
Code follows
worker excerpt: (where the error occurs)
def run_distributed
output = 'urk'
@aFullScript.distrib = true
DRb.start_service
ring_finger = Rinda::RingFinger.new('127.0.0.1')
sleep 2
ring_server = ring_finger.lookup_ring_any
sleep 2
log_message(1, "ring server:\n#{ring_server.inspect}", __LINE__)
service = ring_server.take([:name, :ScriptServer2, nil, nil])
log_message(1, "service:\n#{service.inspect}", __LINE__)
server = service[2]
server.fullScript = @aFullScript
log_message(1, "server:\n#{server.inspect}", __LINE__)
begin
output = server.run
ring_server.write([:name, service[1], service[2], service[3]])
rescue
log_message(3, "In ScriptWorker2.perform #{$!} \n#{@aFullScript.to_yaml}", __LINE__)
ensure
return output
end
end
ring_server.rb:
Dir.chdir('../vendor/gems/RingyDingy-1.2.1/lib')
require 'rubygems'
require 'ringy_dingy/ring_server'
puts "ring server - waftt #{$$}"
rs = RingyDingy::RingServer.new(:Verbose => true)
rs.run
script_server2.rb:
require 'rubygems'
require 'rinda/ring'
require 'drb'
class ScriptServer2
include DRbUndumped
attr_accessor :server_output
attr_accessor :fullScript
def initialize
@fullScript = ''
@server_output = ''
end
def run
@server_output = ''
puts "****** Running #{@fullScript.myName} #{Time.now.strftime("%Y%m%d %H%M%S")} (#{@fullScript})"
@server_output << @fullScript.doit
# puts @server_output
puts "****** Completed #{@fullScript.myName} #{Time.now.strftime("%Y%m%d %H%M%S")} (#{@fullScript})"
@server_output
end
end
DRb.start_service #( nil, ScriptServer2.new )
myPid = Process.pid.to_s
puts "ScriptServer2 #{myPid}"
finger = Rinda::RingFinger.new('127.0.0.1')
ring_server = finger.lookup_ring_any
ring_server.write([:name,
:ScriptServer2,
ScriptServer2.new,
"ScriptServer2 #{myPid}"],
Rinda::SimpleRenewer.new)
liveTS = Time.now.strftime("%Y%m%d%H%M%S")
puts "going live... #{liveTS}"
DRb.thread.join
The command file that starts everything up:
start "rails server - waftt" /i /min ruby script/server -p 9191 start
sleep 6
start "ring server - waftt" /i /min ruby script/ring_server.rb
sleep 10
start "script server 1 - waftt" /i /min ruby lib/script_server2.rb
start "rake jobs:work 1 - waftt" /i /min rake jobs:work
Thanks! pat