views:

421

answers:

6

I am studying how two-phase commit works across a distributed transaction. It is my understanding that in the last part of the phase the transaction coordinator asks each node whether it is ready to commit. If everyone agreed, then it tells them to go ahead and commit.

What prevents the following failure?

  1. All nodes respond that they are ready to commit
  2. The transaction coordinator tells them to "go ahead and commit" but one of the nodes crashes before receiving this message
  3. All other nodes commit successfully, but now the distributed transaction is corrupt
  4. It is my understanding that when the crashed node comes back, its transaction will have been rolled back (since it never got the commit message)

I am assuming each node is running a normal database that doesn't know anything about distributed transactions. What did I miss?

A: 

When the "bad" node doesn't acknowledge the commit or sends an error back to the coordinatng node, all the other nodes are instructed to roll the transaction back (from their undo logs)

cagcowboy
No; that's not correct. It would be correct if some node failed to respond 'ready to commit', but point 1 says they've already passed that point.
Jonathan Leffler
+5  A: 

No. Point 4 is incorrect. Each node records in stable storage that it was able to commit or rollback the transaction, so that it will be able to do as commanded even across crashes. When the crashed node comes back up, it must realize that it has a transaction in pre-commit state, reinstate any relevant locks or other controls, and then attempt to contact the coordinator site to collect the status of the transaction.

The problems only occur if the crashed node never comes back up (then everything else thinks the transaction was OK, or will be when the crashed node comes back).

Jonathan Leffler
Doesn't this assume that the database is aware of distributed transactions? I thought you are supposed to be able to fit distributed transactions on top of normal databases that know nothing of it...
Gili
No; the DBMS have to be aware of their responsibilities in the 2PC protocol, and the key responsibility is to retain a transaction in a Go/No-Go state until the coordinator gives the command. You lose (autonomy) at that point. Informix implements heuristic rollback to deal with vanished systems.
Jonathan Leffler
+5  A: 

Two phase commit isn't foolproof and is just designed to work in the 99% of the time cases.

"The protocol assumes that there is stable storage at each node with a write-ahead log, that no node crashes forever, that the data in the write-ahead log is never lost or corrupted in a crash, and that any two nodes can communicate with each other."

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

reno
Even if the assumptions you mentioned are met I don't see how the node is supposed to know to replay the transaction (instead of rolling it back) when it comes back from a crash.As far as I understand it, the database isn't aware of the distributed transaction; something on top of it does.
Gili
@Gili: the database has to know about the distributed transaction to support 2PC correctly. If you build something on top, then you don't have a reliable 2PC implementation.
Jonathan Leffler
+7  A: 

No, they are not instructed to roll back because in the original poster's scenario, some of the nodes have already committed. What happens is when the crashed node becomes available, the transaction coordinator tells it to commit again.

Because the node responded positively in the "prepare" phase, it is required to be able to "commit", even when it comes back from a crash.

binarycoder
I don't understand how the crashed node is supposed to be able to replay the transaction up to the position where it left off.If you're fitting a distributed transaction library on top of a normal database, what prevents the DB from rolling back on a crash?
Gili
Gili: Because it promised that it would be able to, in the prepare phase of 2PC.
Thomas
+3  A: 

There are many ways to attack the problems with two-phase commit. Almost all of them wind up as some variant of the Paxos three-phase commit algorithm. Mike Burrows, who designed the Chubby lock service at Google which is based on Paxos, said that there are two types of distributed commit algorithms - "Paxos, and incorrect ones" - in a lecture I saw.

One thing the crashed node could do, when it reawakes, is say "I never heard about this transaction, should it have been committed?" to the coordinator, which will tell it what the vote was.

Bear in mind that this is an example of a more general problem: the crashed node could miss many transactions before it recovers. Therefore it's terribly important that upon recovery it should talk either to the coordinator or another replica before making itself available. If the node itself can't tell whether or not it has crashed, then things get more involved but still tractable.

If you use a quorum system for database reads, the inconsistency will be masked (and made known to the database itself).

HenryR
How does the node replay the transaction when it reawakens? As far as I understand it, normal databases *always* roll back on a crash and they're not aware of distributed transactions (you fit some library on top of them). As such, I don't see how you can make this work.
Gili
+2  A: 

Summarizing everyone's answers:

  1. One cannot use normal databases with distributed transactions. The database must explicitly support a transaction coordinator.

  2. The nodes are not instructed to roll back because some of the nodes have already committed. What happens is that when the crashed node comes back, the transaction coordinator tells it to commit again.

Gili
What happens if the coordinator crashes while the node is down, and the node wakes up while the coordinator is still down? What does the node do then?
Aviad P.
I suggest opening a separate question for this issue. I don't know the answer myself but I'm guessing that the node waits for the coordinator to come back before accepting client connections.
Gili