views:

158

answers:

2

I'm reading article about recently release Gizzard sharding framework by twitter(http://engineering.twitter.com/2010/04/introducing-gizzard-framework-for.html). It mentions that all write operations must be idempotent to make sure high reliability.

According to wikipedia, "Idempotent operations are operations that can be applied multiple times without changing the result." But, IMHO, in Gazzard case, idempotent write operation should be operations that sequence doesn't matter.

Now, my question is: How to make write operation idempotent?

The only thing I can image is to have a version number attached to each write. For example, in blog system. Each blog must have a $blog_id and $content. In application level, we always write a blog content like this write($blog_id, $content, $version). The $version is determined to be unique in application level. So, if application first try to set one blog to "Hello world" and second want it to be "Goodbye", the write is idempotent. We have such two write operations:

write($blog_id, "Hello world", 1);
write($blog_id, "Goodbye", 2);

These two operations are supposed to changed two different records in DB. So, no matter how many times and what sequence these two operations executed, the results are same.

This is just my understanding. Please correct me if I'm wrong.

+1  A: 

You've got the right idea. Setting a particular value is idempotent, because if you carry out that operation more than once, you have the same result. The classic non-idempotent write is an append, because repetition would lead to multiple copies being appended.

Also, see this previous stackoverflow question.

Rob Lachlan
+1  A: 

You absolutely right. Idempotent operations by itself can provide only one conflict resolution pattern - "Last write win". It is possible solution if your writes can not be reordered in time. If they can, you should provide additional information to make automatic conflict resolution. And the idea you said about is not new. In general case this called vector clocks .

We use version based conflict resolution in one of our systems which collect change history of objects in out system. Clients sends full object state and version information to history module (asynchronously). History module then can reorder object states in correct manner and save only delta on persistent storage. The only restriction is that client should use some sort of concurrency control when make changes to object (optimistic locking is very good method if you track object state version).

dotsid