views:

846

answers:

1

I understand, in a fuzzy sort of way, how regular ACID transactions work. You perform some work on a database in such a way that the work is not confirmed until some kind of commit flag is set. The commit part is based on some underlying assumption (like a single disk block write is atomic). In the event of a catastrophic error, you can just clear out the uncommitted data in the recovery phase.

How do distributed transactions work? In some of the MS documentation I have read that you can somehow perform a transaction across databases and filesystems (among other things).

This technology could be (and probably is) used for installers, where you want the program to be fully installed or fully absent. You simply begin a transaction at the start of the installer. Next you could connect to the registry and filesystem, making the changes that define the installation. When the job is done, simply commit, or rollback if the installation fails for some reason. The registry and filesystem are automatically cleaned for you by this magical distributed transaction coordinator.

How is it possible that two disparate systems can be transacted upon in this fashion? It seems to me that it is always possible to leave the system in an inconsistent state, where the filesystem has committed its changes and the registry has not. I think in MSDTC it is even possible to perform a transaction across the network.

I have read http://blogs.msdn.com/florinlazar/archive/2004/03/04/84199.aspx, but it feels like only the beginning of the explanation, and that step 4 should be expanded considerably.

Edit: From what I gather on http://en.wikipedia.org/wiki/Distributed_transaction, it can be accomplished by a two-phase commit (http://en.wikipedia.org/wiki/Two-phase_commit). After reading this, I'm still not understanding the method 100%, it seems like there is a lot of room for error between the steps.

A: 

About "step 4":

The transaction manager coordinates with the resource managers to ensure that all succeed to do the requested work or none of the work if done, thus maintaining the ACID properties.

This of course requires all participants to provide the proper interfaces and (error-free) implementations. The interface looks like vaguely this:

public interface ITransactionParticipant {
    bool WouldCommitWork();
    void Commit();
    void Rollback();
}

The Transaction manager at commit-time queries all participants whether they are willing to commit the transaction. The participants may only assert this if they are able to commit this transaction under all allowable error conditions (validation, system errors, etc). After all participants have asserted the ability to commit the transaction, the manager sends the Commit() message to all participants. If any participant instead raises an error or times out, the whole transaction aborts and individual members are rolled back.

This protocol requires participants to have recorded their whole transaction content before asserting their ability to commit. Of course this has to be in a special local transatction log structure to be able to recover from various kinds of failures.

David Schmitt
What happens if you have participants A and B, A gets committed and returns success, then the B gets committed and returns failure but before A can be rolled back, the network drops? Another cases would be the network failure could prevent B from being committed.
BCS
B may not fail on Commit() after returning true on WouldCommitWork() and A would not be committed before all participants return true on WouldCommitWork(). All steps of the protocol have associated timeouts which cause automatic rollbacks when hit. If a part of the cluster fails it will have to replay the log from a non-failing member to be able to join again. Fail-proofing clusters against byzantine errors is a hot research topic and requires more than two participants.
David Schmitt
The only way B can prevent failure is to lock out any changes that could cause failure, but OK so scratch the first option. OTOH hardware failure still remains. -- The cases I'm interested in is where any link can fail while both system continue to provide service to other clients. In that case the system need to be coherent even while a dropped link is down. I don't see how they can ever agree on if transaction is to be committed and be sure that every one comes to the same conclusion.
BCS
That's the point of WouldCommitWork(): only legal transactions pass on to Commit(). Regarding the split-brain possibility of having both A and B servicing users without being able to communicate anymore: You need at least 2N+1 (N>0) servers to avoid that and servers who find themselves in a minority group must shutdown. See http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html and also http://en.wikipedia.org/wiki/Two-phase_commit_protocol
David Schmitt