ansaurus

Question

How do distributed transactions work (eg. MSDTC)?

Answer 1

A:

About "step 4":

The transaction manager coordinates with the resource managers to ensure that all succeed to do the requested work or none of the work if done, thus maintaining the ACID properties.

This of course requires all participants to provide the proper interfaces and (error-free) implementations. The interface looks like vaguely this:

public interface ITransactionParticipant {
    bool WouldCommitWork();
    void Commit();
    void Rollback();
}

The Transaction manager at commit-time queries all participants whether they are willing to commit the transaction. The participants may only assert this if they are able to commit this transaction under all allowable error conditions (validation, system errors, etc). After all participants have asserted the ability to commit the transaction, the manager sends the Commit() message to all participants. If any participant instead raises an error or times out, the whole transaction aborts and individual members are rolled back.

This protocol requires participants to have recorded their whole transaction content before asserting their ability to commit. Of course this has to be in a special local transatction log structure to be able to recover from various kinds of failures.

David Schmitt 2008-09-11 06:31:04

What happens if you have participants A and B, A gets committed and returns success, then the B gets committed and returns failure but before A can be rolled back, the network drops? Another cases would be the network failure could prevent B from being committed.

BCS 2009-04-28 22:35:37

B may not fail on Commit() after returning true on WouldCommitWork() and A would not be committed before all participants return true on WouldCommitWork(). All steps of the protocol have associated timeouts which cause automatic rollbacks when hit. If a part of the cluster fails it will have to replay the log from a non-failing member to be able to join again. Fail-proofing clusters against byzantine errors is a hot research topic and requires more than two participants.

David Schmitt 2009-04-29 07:41:54

The only way B can prevent failure is to lock out any changes that could cause failure, but OK so scratch the first option. OTOH hardware failure still remains. -- The cases I'm interested in is where any link can fail while both system continue to provide service to other clients. In that case the system need to be coherent even while a dropped link is down. I don't see how they can ever agree on if transaction is to be committed and be sure that every one comes to the same conclusion.

BCS 2009-04-29 18:44:27

That's the point of WouldCommitWork(): only legal transactions pass on to Commit(). Regarding the split-brain possibility of having both A and B servicing users without being able to communicate anymore: You need at least 2N+1 (N>0) servers to avoid that and servers who find themselves in a minority group must shutdown. See http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html and also http://en.wikipedia.org/wiki/Two-phase_commit_protocol

David Schmitt 2009-04-30 07:25:04

ansaurus

tags:

views:

answers:

How do distributed transactions work (eg. MSDTC)?

related questions