views:

301

answers:

5

I am looking for a good strategy of dealing with database deadlocks from within a Java 6 application; several parallel threads could, potentially, write into the same table at the same time. The database (Ingres RDMBS) will randomly kill one of the sessions if it detects a deadlock.

What would be an acceptable technique to deal with the deadlock situation, given the following requirements?

  • the total elapsed time should be kept as small as reasonably possible
  • killing a session will incur a significant (measurable) rollback
  • time threads have no way to
    communicate with each other i.e. the strategy should be autonomous

So far, the strategy I came up with is something along these lines:

short attempts = 0;
boolean success = false;
long delayMs = 0;

Random random = new Random();
do {
    try {
     //insert loads of records in table 'x'
     success = true;
    } catch (ConcurrencyFailureException e) {
     attempts++;
     success = false;
     delayMs = 1000*attempts+random.nextInt(1000*attempts);

     try {
       Thread.sleep(delayMs);
      } catch (InterruptedException ie) {
     }
    }
} while (!success);

Can it be improved in any way? e.g. waiting for a fixed amount (magic number) of seconds. Is there a different strategy that will produce better results?

Note: Several database level techniques will be used to ensure deadlocks are, in practice, very rare. Also, the application will attempt to avoid scheduling threads that write into the same table at the same time. The situation above will be just a “worst case scenario”.

Note: The table in which records are inserted is organised as a heap partitioned table and has no indexes; each thread will insert records in it's own partition.

A: 

That's the way we did it. Loop and retry the transaction until it finishes.

We didn't mess with random delays.

Also, we did the commit inside the try block and the rollback in the exception handler.

When you have multiple lockable resources and multiple concurrent transactions, deadlock is unavoidable. It's a logical consequence of contention for locks.

If you avoid contention for locks (i.e., pessimistic table-level locking) then you also tend to prevent concurrency. If you can define transaction which don't contend for locks, you can avoid deadlock. Concurrent access to the same table, however, is pretty much the definition of deadlock.

When loading, inserts (especial in a HEAP table) can (often) proceed in parallel without many contention issues. If you delay building the indices, then there's no other updates going on during the insert.

So, you may be able to avoid by dropping the indexes, changing the organization to a heap, loading with multiple concurrent processes (or threads, it's usually faster to have multiple processes), then build your indices (and possibly reorganize the table), you may be able to avoid deadlocks.

When doing updates or deletes, not much helps.

S.Lott
In my simulations (20 threads) a zero delay will trigger a huge number of deadlocks; a large delay will actually improve significantly the overall elapsed time. autocommit is set to on - but a database deadlock will also involve an automatic rollback
Adrian
@Adrian: We had fairly complex error handling, so we did the rollback "just to be sure". Also we were working in C, so it was a "virtual exception". Finally, we had some C/Ingres n00bs doing the coding, so we did that in case they had hidden little logic errors.
S.Lott
@Adrian: In our actual practice, the presence of absence of a delay didn't make any practical difference. A heavily loaded OS introduces it's own randomized delays. YMMV. I'm not debating your simulation. I'm telling you what we *did*. We didn't mess with random delays.
S.Lott
@S.Lott We do use a heap table (actually a partitioned heap table) with no indexes
Adrian
+1  A: 

If you don't need to have concurrent access to the database a simple solution might be to remove it and use a task processing queue to update the database instead, serialising access to the database via the queue. I realise this will introduce an asynchronous element to your application, and so would not be suitable for most user initiated applications or online web-apps, but might be worth considering for a batch/offline type application (I realise probably not the answer your looking for though).

Joel
Processing queue will slow down the application considerably - avoiding deadlocks at all costs it is not an option. Production server is a 4 processor Unix server, should handle some number of parallel threads easy
Adrian
+7  A: 

A commonly used approach is some form of exponential back-off. Rather than your 1000*attempts+random aproach, make the delay an exponential function of the number of attempts. This ensures minimal latency in the first one or two attempts, where it might have just been bad luck that you deadlocked, but gives you much bigger delays later, when it is clear that the connection really is congested.

Of course, another approach would be to try to arrange your database accesses so that deadlocks are less likely to occur. But without knowing what your queries do (and how, and when they're executed), it's impossible to say if that can be done

jalf
Exponential function sounds good - I will try to simulate it!As I mentioned in the note, the application design will aim to avoid deadlocks, including arranging the database access favourably. But there is no guarantee that, in corner cases, a large number of deadlocks will occur.
Adrian
Can you provide any technical reference for the "exponential back-off" technique you mention?
Adrian
It is commonly used in network protocols to avoid congestion. Check the wiki article: http://en.wikipedia.org/wiki/Exponential_backoffBut the basic idea is simple. You simply use some kind of exponential function to determine the delay at each retry attempt. The exact details can be tweaked to suit your purposes. Of course the simplest possible implementation would be to delay for 2^n ms where n is the number of retries so far. But perhaps you think that grows too slowly to begin with, or starts too low, or grows too quickly. Then you just add a multiplier, or add something to n
jalf
The basic idea is simply that the first 1-3 tries or so should be cheap and have very short delays. Most likely they were just bad luck, and you can retry again almost immediately and you'll avoid the deadlock. But if you keep failing, it means that connection is congested, and you have to back off and retry less often, to reduce the pressure on the connection. If the line is congested, all clients will do this, and so the congestion will resolve itself as all clients wait longer and longer until you find a balance.
jalf
A: 

With a database like Ingres you will always get some deadlocks, so you have to assume that any insert, update or delete will fail and have a retry strategy in place (as in your example). You should design your database so that contention is minimised and deadlocks only happen rarely. If you are continually getting transactions failing even after several retries, then this is a sign that you'll have to do some major database redesign (or move to a system like Oracle where it is usually possible to design applications to avoid deadlocks by suitable use of row-level locking).

Chris Card
no problem, but I would add that serialising transactions may be a quicker option if you are getting too many deadlocks/rollbacks/retries
Chris Card
Moving to another RDBMS like Oracle is not an option for this project.Serialising was tried and it is quite slower than a "normal" run, with a low number of deadlocks
Adrian
A: 

how is this ?

short attempts = 0;
boolean success = false;
long delayMs = 0;

Random random = new Random();
do {
try {
     synchronized(ClassName.class) {
         //insert loads of records in table 'x'
      }

    success = true;
} catch (ConcurrencyFailureException e) {
    attempts++;
    success = false;
    delayMs = 1000*attempts+random.nextInt(1000*attempts);

    try {
                    Thread.sleep(delayMs);
            } catch (InterruptedException ie) {
    }
  }
} while (!success);
Upul