views:

188

answers:

2

Hey guys,

Can someone tell me the principle of how TransactionScope guarantees data integrity across multiple databases? I imagine it first sends the commands to the databases and then waits for the databases to respond before sending them a message to apply the command sent earlier. However when execution is stopped abruptly when sending those apply messages we could still end up with a database that has applied the command and one that has not. Can anyone shed some light on this?

Edit:

I guess what Im asking is can I rely on TransactionScope to guarantee data integrity when writing to multiple databases in case of a power outage or a sudden shutdown.

Thanks, Bas

Example:

    using(var scope=new TransactionScope())
    {
        using (var context = new FirstEntities())
        {
            context.AddToSomethingSet(new Something());
            context.SaveChanges();
        }

        using (var context = new SecondEntities())
        {
            context.AddToSomethingElseSet(new SomethingElse());
            context.SaveChanges();
        }

        scope.Complete();
    }
+5  A: 

It promotes it to the Distrirbuted Transaction Coordinator (msdtc) if it detects multiple databases which uses each scope as a part of a 2-phase-commit. Each scope votes to commit and hence we get the ACID properties but distributed accross databases. It can also be integrated with TxF, TxR. You should be able to use it the way you describe.

The two databases are consistent as the distributed COM+ transaction running under MTC have attached to them, database transactions.

If one database votes to commit (e.g. by doing a (:TransactionScope).Commit()), "it" tells the DTC that it votes to commit. When all databases have done this they have a change-list. As far as the database transactions don't deadlock or conflict with other transactions now (e.g. by means of a fairness algorithm that pre-empts one transaction) all operations for each database are in the transaction log. If the system loses power when not yet commit for one database has finished but it has for another, it has been recorded in the transaction log that all resources have voted to commit, so there is no logical implication that the commit should fail. Hence, next time the database that wasn't able to commit boots up, it will finish those transactions left in this indeterminate state.

Henrik
Thanks for the answer, but Im still a bit puzzled. At some point both databases have to be told to apply their changes. Lets say there is a power outage after telling the first db to apply, then the databases are out of sync. Or am I missing something?
Bas Smit
I mean individually both databases are valid, but they are no longer in sync with each other.
Bas Smit
I've added more to my answer. See the wikipedia link for more links and resources. In short; it works without the databases are going out of sync (they also use 2PC internally), if you use TransactionScopes, so you should have nothing to worry about. There are more details that can be applied to this answer though, such as disk writes not actually being atomic contrary to assumptions from database systems and what types of hard drives you ought to use in order to garantuee as much as you can (http://forums.bit-tech.net/showthread.php?t=146156)
Henrik
@Bas Smit: With respect, Henrik has already given you references which describe how 2PC works in detail. In particular, see the Bernstein reference at the bottom of the Wikipedia article. The questions you're asking are not new, and you can either (1) read up on 2PC if you want to know how recovery works or (2) just accept that these issues have been raised before. Henrik answered your question, and it shouldn't be necessary for him to spell out all the details of a fairly complex protocol.
Craig Stuntz
+2  A: 

With distributed transactions it can in fact happen that the databases become inconsistent. You said:

At some point both databases have to be told to apply their changes. Lets say there is a power outage after telling the first db to apply, then the databases are out of sync. Or am I missing something?

You aren't. I think this is known as the generals problem. It can provably not be prevented. The windows of failure is however quite small.

usr
As far as I understand the wikipedia article about the generals problem, given that a message and its acknowledgement is successfully exchanged (aka an arbitrary step written in the tx-log), 2PC avoids inconsistencies. If your network goes down after one having committed, and the other not yet, the first will have successfully prepared and so will the other (otherwise they will not have been told to commit), at which point the first database can safely assume the second database will commit when it comes back online.
Henrik