I'm currently exploring worst case scenarios of atomic commit protocols like 2PC and 3PC and am stuck at the point that I can't find out why 3PC can guarantee atomicity. That is, how does it guarantee that if cohort A commits, cohort B also commits?
Here's the simplified 3PC from the Wikipedia article:
Now let's assume the following case:
- Two cohorts participate in the transaction (A and B)
- Both do their work, then vote for commit
- Coordinator now sends precommit messages...
- A receives the precommit message, acknowledges, and then goes offline for a long time
- B doesn't receive the precommit message (whatever the reason might be) and is thus still in "uncertain" state
The results:
- Coordinator aborts the transaction because not all precommit messages were sent and acknowledged successfully
- A, who is in precommit state, is still offline, thus times out and commits
- B aborts in any case: He either stays offline and times out (causes abort) or comes online and receives the abort command from the coordinator
And there you have it: One cohort committed, another aborted. The transaction is screwed.
So what am I missing here? In my understanding, if the automatic commit on timeout (in precommit state) was replaced by infinitely waiting for a coordinator command, that case should work fine.