views:

274

answers:

2

We have an asynchronous task that performs a potentially long-running calculation for an object. The result is then cached on the object. To prevent multiple tasks from repeating the same work, we added locking with an atomic SQL update:

UPDATE objects SET locked = 1 WHERE id = 1234 AND locked = 0

The locking is only for the asynchronous task. The object itself may still be updated by the user. If that happens, any unfinished task for an old version of the object should discard its results as they're likely out-of-date. This is also pretty easy to do with an atomic SQL update:

UPDATE objects SET results = '...' WHERE id = 1234 AND version = 1

If the object has been updated, its version won't match and so the results will be discarded.

These two atomic updates should handle any possible race conditions. The question is how to verify that in unit tests.

The first semaphore is easy to test, as it is simply a matter of setting up two different tests with the two possible scenarios: (1) where the object is locked and (2) where the object is not locked. (We don't need to test the atomicity of the SQL query as that should be the responsibility of the database vendor.)

How does one test the second semaphore? The object needs to be changed by a third party some time after the first semaphore but before the second. This would require a pause in execution so that the update may be reliably and consistently performed, but I know of no support for injecting breakpoints with RSpec. Is there a way to do this? Or is there some other technique I'm overlooking for simulating such race conditions?

A: 

Maybe I'm not understanding your problem but I think this race condition is pretty easy to simulate:

var a = grab_record(1)
var b = grab_record(1)

modify_record(a)
modify_record(b)

update_record(a)
update_record(b) // <-- this update should fail

(sorry it's pseudocode, maybe someone can edit to make this look like an RSpec test, or at least like Ruby)

Martinho Fernandes
+4  A: 

You can borrow an idea from electronics manufacturing and put test hooks directly into the production code. Just as a circuit board can be manufactured with special places for test equipment to control and probe the circuit, we can do the same thing with the code.

SUppose we have some code inserting a row into the database:

class TestSubject

  def insert_unless_exists
    if !row_exists?
      insert_row
    end
  end

end

But this code is running on multiple computers. There's a race condition, then, since another processes may insert the row between our test and our insert, causing a DuplicateKey exception. We want to test that our code handles the exception that results from that race condition. In order to do that, our test needs to insert the row after the call to row_exists? but before the call to insert_row. So let's add a test hook right there:

class TestSubject

  def insert_unless_exists
    if !row_exists?
      before_insert_row_hook
      insert_row
    end
  end

  def before_insert_row_hook
  end

end

When run in the wild, the hook does nothing except eat up a tiny bit of CPU time. But when the code is being tested for the race condition, the test monkey-patches before_insert_row_hook:

class TestSubject
  def before_insert_row_hook
    insert_row
  end
end

Isn't that sly? Like a parasitic wasp larva that has hijacked the body of an unsuspecting caterpillar, the test hijacked the code under test so that it will create the exact condition we need tested.

This idea is as simple as the XOR cursor, so I suspect many programmers have independently invented it. I have found it to be generally useful for testing code with race conditions. I hope it helps.

Wayne Conrad
A-ha. That would do it. Though rather than add an explicit hook, I might just use `alias_method_chain` to extend the functionality of a method that _has_ to be called between the two semaphores anyway—the long-running task.
Ian
Ian, That would do it.
Wayne Conrad
+1 for using parasitic wasp larva in your simile.
aronchick
BTW, would love to see how you coded this with the chaining. Interesting idea.
aronchick