views:

350

answers:

2

We're building a web app on top of the Amazon Web Services stack, and I'm loving it so far.

We're also making full use of test driven development and that is also proving to be fantastic.

I'm just hoping someone can help me out with an issue I've come across which relates to Amazon SimpleDB's "eventual consistency".

The best example of the issue arising is in a unit test which adds a user and then checks that the user was added successfully by making a call to fetch that newly added user.

I could easily go ahead and just write the tests for that and it could all work fine, but I'm aware of "eventual consistency" and the possibility that when I make the call to fetch the user, the user might not have actually been added yet. Obviously if the fetch user function is called and the user is not in the system, it will return false or failure.

What I'd like to know is what is the best way to handle this? I've seen suggestions of making a function which sleeps for 5 seconds between requests and tries 10 times. I've also seen solutions with exponential backoff. What is the optimal solution?

A: 

So are you testing the database or the code that goes against it? If you are testing the code that goes against the database, then you should have tests that expect the code to add a user, then not return a user until, say, some random amount of time has passed. Obviously, this is most easily set up using a fake DB that keeps track of the amount of time since the last request and only returns the expected value after the specified time since the initial request has elapsed. You'd have similar tests where the fake is set up to never return the value, set up to return it right away, etc. Instrument you fake to keep track of the interactions and you can set up in your test how you want your code to behave, so that your code follows the expected behavior (polling at least twice, for instance).

tvanfosson
I'm testing the code, and as Mocky has said I should be using mock objects. This brings up some other issues though. I'd love to hear your views on those. Thanks!
joelg
+2  A: 

I recommend against using the actual SimpleDB service for unit testing your own code. You will be testing your code + the SimpleDB client + the network + SimpleDB itself. What you need is mock SimpleDB client to run unit tests against. This way you are only testing the code that needs to be tested. Test driven development call upon you to not test if the database works in the unit tests for your code.

If you are testing your own SimpleDB client code you can use a mock SimpleDB service or something like M/DB which is a SimpleDB clone you can run locally.

But this brings up a larger issue because SimpleDB provides eventual-consistency and not read-your-writes consistency. Your code will absolutely need to be able to deal with the fact that a newly added item will not immediately be returned from a get or a query.

I have no reason to think that your code can't handle it. I'm just saying that as a general rule when you run into problems like these with tests, it hints at issues that need to be considered with the code being tested. You may find that you want either a general layer of caching between your app code and SimpleDB or you may want a session cache that can provide read-your-writes consistency.

Mocky
Of course! We should definitely be using a mock SimpleDB client. Thanks so very much for your response Mocky. What an appropriate name?!I agree the code can't handle it, the only part is when you're doing TDD or BDD and you're testing things such as "Test user exists when created". That's obviously an issue, but not normally since the few seconds is OK in a human interaction environment. It's just an issue in the testing. Is a caching layer which caches the writes which are sent a good idea? Could you suggest how that should work?
joelg
If you only have one SimpleDB client machine you can use a simple read-write cache locally and store writes and the values you get back from reads, which would probably be the case during testing. With more concurrent readers and writers on different machines a distributed cache like Memcached will save you from having to deal with a lot of stale local cache values.
Mocky
Thanks Mocky, that's great answers and plenty for me to think about!
joelg