views:

240

answers:

2

I'm looking for a structured approach to long-running (hours or more) transactions. As mentioned here, these type of interactions are usually handled by optimistic locking and manual merge strategies.

It would be very handy to have some more structured approach to this type of problem using standard transactions. Various long-running interactions such as user registration, order confirmation etc. all have transaction-like semantics, and it is both error-prone and tedious to invent your own fragile manual roll-back and/or time-out/clean-up strategies.

Taking a RDBMS as an example, I realize that it would be a major performance cost associated with keeping all the transactions open. As an alternative, I could imagine having a database supporting two isolation levels/strategies simultaneously, one for short-running and one for long-running conversations. Long-running conversations could then for instance have more strict limitations on data access to facilitate them taking more time (read-only semantics on some data, optimistic locking semantics etc).

Are there any solutions which could do something similar?

A: 

I'd rather use a BPM tool for such kind of things, there are explicitly intended to support long-running orchestrations. I can't elaborate right but suggest to check Understanding BPM Servers. I'm quoting some parts below but the whole paper is worth the read:

Managing an Orchestration's State

One of the biggest differences between an orchestration and the business services it uses is the time each takes to execute. A request to a typical service generates a reply within a few seconds. Because it commonly drives all or part of a business process, however, an orchestration may run for hours, days, or weeks, depending on how long the process takes to finish. What if human approval is required at some point in the process, for instance, and the person who must give her approval is on vacation? Because business processes can take a long time to complete, the orchestrations that control them can also run for a long time.

This long-running nature affects how an orchestration manages the in-memory information it maintains—the state—about a running process. If the orchestration is blocked for a significant period of time, keeping this state in memory doesn’t make much sense. Instead, a BPM server should provide a way for an orchestration’s state to be automatically written to disk, then restored again when the business process resumes, even if it’s days or weeks later.

State management illustrates another notable difference between BPM servers and application servers. Since supporting long-running business processes isn’t their primary purpose, application servers haven’t traditionally addressed this kind of state management. Because they are explicitly intended to support long-running orchestrations, however, BPM servers do provide this service.

Handling Transactions

Many business processes require the all-or-nothing behavior characterized by a transaction. For example, an orchestration driving a business process might need to invoke two business services and ensure that either both requests succeed or both fail. This kind of atomic transaction can be accomplished using a standard two-phase commit protocol, and it’s something that BPM servers typically support. In fact, application servers include this feature, so a BPM server built on an application server can offer this quite easily.

The nature of many business processes raises another issue, however. What if a particular process requires all-or-nothing behavior, but a traditional atomic transaction isn’t possible? Atomic transactions require locking data for the life of the transaction, something that isn’t a problem when the transaction is short. But suppose the services that must be bundled into an all-or-nothing group include one that requires human approval. Even if the required approver isn’t on vacation, the time it takes for a person to respond is likely far too long for data to remain locked. Or what if one service that must be in this transactional group doesn’t participate in atomic transactions? This isn’t a far-fetched worry, since many applications won’t let arbitrary clients lock their data.

To handle situations like these, a BPM server supports long-running transactions. Also called business activities and other names, long-running transactions handle errors not by rolling back all updates, but rather by executing some kind of compensating logic when an error occurs. For example, suppose a particular long-running transaction includes an atomic transaction that transfers money from one bank to another, followed by an operation that executes another application once the transfer has succeeded. If this final operation fails, the logic of the business process requires that the money transfer be undone. Yet the atomic transaction that performed this transfer has already committed— how can it be reversed? The answer is that compensating logic must run if a failure occurs, logic that might execute another atomic transaction to undo the effects of the transfer. A BPM server provides built-in facilities that allow the creator of an orchestration to define this compensating action, then have it automatically execute when a long- running transaction fails.

While compensation is useful when atomic transactions aren’t possible, it’s not without problems. Suppose an orchestration modifies some data in the early part of a long-running transaction, for instance, then runs a compensating operation later to change this data back to its original state. What happens if some other application accesses that data in between these two events? This second application may well use data that’s ultimately deemed to be incorrect in making business decisions, such as computing credit risk. Or think about operations for which there is no obvious compensation. If an orchestration causes a missile to be launched, there’s no way for compensating code in that orchestration to reverse this. Yet while compensation isn’t a perfect solution, it is nevertheless the right approach for an important category of problems faced by business processes.

Pascal Thivent
Could you elaborate on how that helps? Thanks
disown
A: 

RDBMS ACID transactions belong always to short, atomic and local operations. Distributed application, autonomus services, loosely coupled components use different strategies like rentention and compensating transactions.

A good read on this topic are Pat Helland's papers, he's been teaching about this subject since the 80s. See for instance Architecture of an Autonomous Application or Fiefdoms and Emissaries.

Remus Rusanu
I read both these articles, and although I fully agree with the concepts and found them quite funny :), I was more looking for tools making these things happen. I find that the compensating operations could be implemented with the help of the datasource for instance (they look a lot like transactions). The optimistic locking strategies employed to deal with the stale data from the Emissaries could be supported by the datasource as well. So in essence, thank you for great articles, but are there any tools which support this type of architecture, or do you have to do it all manually?
disown
Not realy. The tools I've been wroking on for years asa dev, Service Broker, had these concepts in its design, see http://bit.ly/4DDTZ5 for example. But to that is a very narrow and specialized example, far from a generic solutoin supported by the development toolset. In the MS stack the WCF and the Synch Framework have such objectives on their agenda, but I wouldn't venture to claim that an actual solution exists (ahem ... or that it ever will). On the JEE stack they seem to grok this problems better, but I'm not familiar enough with that tool stack to give a verdict.
Remus Rusanu
Although I found both your and Pascals answer good, I'm going to accept yours, since it illustrated the problems better, and I think that BPM is only a part of the answer. The real problem seem to be that you indeed need to build your app to support this behavior, as seen with temporary flight bookings, reservations in bank systems etcetera. The links you provided provided a good starting point.
disown