I have a production web site with the following environment:
- Rails 2.3.5
- MySQL Server 5.1.33
- Enterprise Ruby 1.8.6 (2008-08-11 patchlevel 287) [x86_64-linux]
- mysql gem 2.7
- Old version of BackgrounDRb plugin running on 4 different servers for background tasks, with 5 different workers each (Ruby threads, not separate processes!).
One of the BackgrounDRb workers processes the job queue using a variation of "optimistic locking":
update_sql = "update jobs
set updated_at = CURRENT_TIMESTAMP,
in_process = 1
where id = #{job.id} and in_process = 0"
affected_rows = Job.connection.update(update_sql)
captured_job = affected_rows > 0 ? Job.find(job.id) : nil
The code above tries to update the record with the given ID and with extra condition for in_process field. So if this same record was already updated by a different server/process then UPDATE statement would just return 0 (zero) and the job would not be processed simultaneously by 2 different servers.
The problem is: sometimes "Job.connection.update(update_sql)" returns 0 (zero) even when the record was actually updated! I was only able to find that out after a heavy logging was added to the code. It only happens in Production at night when we have a heavy load...
My guess is that mysql gem uses some global variable (class-variable) for affected_rows that is shared across all 5 threads of BackgrounDRb process, but I'm not sure. I was looking through the code of mysql gem and ActiveRecord, but I couldn't understand how it really works.
Could you please help with explanation of how this could happen?
Update 2010-07-07: We decided to not use threads for job processing - that'll solve all our problems: every job processor would be a separate process :)