views:

194

answers:

5

Transactional programming is, in this day and age, a staple in modern development. Concurrency and fault-tolerance are critical to an applications longevity and, rightly so, transactional logic has become easy to implement. As applications grow though, it seems that transactional code tends to become more and more burdensome on the scalability of the application, and when you bridge into distributed transactions and mirrored data sets the issues start to become very complicated. I'm curious what seems to be the point, in data size or application complexity, that transactions frequently start becoming the source of issues (causing timeouts, deadlocks, performance issues in mission critical code, etc) which are more bothersome to fix, troubleshoot or workaround than designing a data model that is more fault-tolerant in itself, or using other means to ensure data integrity. Also, what design patterns serve to minimize these impacts or make standard transactional logic obsolete or a non-issue?

--

EDIT: We've got some answers of reasonable quality so far, but I think I'll post an answer myself to bring up some of the things I've heard about to try to inspire some additional creativity; most of the responses I'm getting are pessimistic views of the problem.

Another important note is that not all dead-locks are a result of poorly coded procedures; sometimes there are mission critical operations that depend on similar resources in different orders, or complex joins in different queries that step on each other; this is an issue that can sometimes seem unavoidable, but I've been a part of reworking workflows to facilitate an execution order that is less likely to cause one.

+2  A: 

I think no design pattern can solve this issue in itself. Good database design, good store procedure programming and especially learning how to keep your transactions short will ease most of the problems. There is no 100% guaranteed method of not having problems though.

In basically every case I've seen in my career though, deadlocks and slowdowns were solved by fixing the stored procedures:

  • making sure all tables are accessed in order prevents deadlocks
  • fixing indexes and statistics makes everything faster (hence diminishes the chance of deadlock)
  • sometimes there was no real need of transactions, it just "looked" like it
  • sometimes transactions could be eliminated by making multiple statement stored procedures in single statement ones.
Sklivvz
+2  A: 

The use of shared resources is wrong in the long run. Because by reusing an existing environment you are creating more and more possibilities. Just review the busy beavers :) The way Erlang goes is the right way to produce fault-tolerant and easily verifiable systems.

But transactional memory is essential for many applications in widespread use. If you consult a bank with its millions of customers for example you can't just copy the data for the sake of efficiency.

I think monads are a cool concept to handle the difficult concept of changing state.

Roman Glass
A: 

If you are talking 'cloud computing' here, the answer would be to localize each transaction to the place where it happens in the cloud. There is no need for entire cloud to be consistent, as that would kill performance (as you noted). Simply, keep track of what is changed and where and handle multiple small transactions as changes propagate through the system. The situation where user A updates record R and user B at the other end of cloud does not see it (yet) is the same as the one when user A didn't do the change yet in the current strict-transactional environment. This could lead to discepancy in an update-heavy system, so systems should be architectured to work with updates as less as possible - moving things to aggregation of data and pulling out the aggregates once the exact figure is critical (i.e. moving requirement for consistency from write-time to critical-read-time).

Well, just my POV. It's hard to concieve a system that is application agnostic in this case.

Milan Babuškov
A: 

Try to make changes at the database level in the least number of possible instructions.

The general rule is to lock a resourse the lest possible time. Using T-SQL, PLSQL, Java on Oracle or any similar way you can reduce the time that each transaction locks a shared resource. I fact transactions in the database are optimized with row level locks, multiversions, and other kind of inteligent techniques. If you can make the transaction at the database you save the network latency. Appart from other layers like ODBC/JDBC/OLEBD.

Sometimes the programmer try to obtain the good things of a database ( It is transactional, parallel, distributed, ) but keep a caché of the data. Then they need to add manually some of the database features.

borjab
A: 

One approach I've heard of is to make a versioned insert only model where no updates ever occur. During selects the version is used to select only the latest rows. One downside I know of with this approach is that the database can get rather large very quickly.

I also know that some solutions, such as FogBugz, don't use enforced foreign keys, which I believe would also help mitigate some of these problems because the SQL query plan can lock linked tables during selects or updates even if no data is changing in them, and if it's a highly contended table that gets locked it can increase the chance of DeadLock or Timeout.

I don't know much about these approaches though since I've never used them, so I assume there are pros and cons to each that I'm not aware of, as well as some other techniques I've never heard about.

I've also been looking into some of the material from Carlo Pescio's recent post, which I've not had enough time to do it justice unfortunately, but the material seems very interesting.

TheXenocide