ansaurus

Question

How do I ensure data consistency in this concurrent situation?

Answer 1

+3 A:

Create a unique constraint on name column in database.

Tadeusz Kopec 2010-03-12 13:11:35

In practice, it's slightly more complicated...

ewernli 2010-03-12 13:30:11

Answer 2

+2 A:

Add a unique constraint for the name column.

lajuette 2010-03-12 13:13:16

Plus correctly detect the unique constraint issue from the Exception. This can be non-trivial depending on the info the JDBC driver returns in the Exception. Spring does this in its JDBC module so sneak a peek at that code if you get stuck.

Mike Q 2010-03-12 13:17:29

In practice, it's slightly more complicated...

ewernli 2010-03-12 13:30:51

Answer 3

+5 A:

Assuming there is a unique constraint on the name column, each insert will acquire a lock. Any thread that attempts to insert it a second time concurrently will wait until the 1st insert either succeeds or fails (tx commit or rolls back).

If the 1st transaction succeeds, 2nd transaction will fail with with a unique key violation. Then you know it exists already.

If there is one insert per transaction, it'ok. If there are more than 1 insert per transaction, you may deadlock.

Each thread will pass a String name - where that name exists in the table, the database should return the id for the row, where the name doesn't already exist, the name should be inserted and the id returned.

So all in all, the algo is like this:

1 read row with name
   2.1 if found, return row id
   2.2 if not found, attempt to insert
      2.2.1 if insert succeeds, return new row id
      2.2.2 if insert fails with unique constraint violation
          2.2.2.1 read row with name
          2.2.2.2 read should succeed this time, so return row id

Because there can be a high contention on the unique index, the insert may block for some time. In which case the transaction may time out. Make some stress test, and tune the configuration until it works correctly with your load.

Also, you should check if you get a unique constraint violation exception or some other exception.

And again, this works only if there is one insert per transaction, otherwise it may deadlock.

Also, you can try to read the row at step 1 with "select * for update". In this case, it waits until a concurrent insert either commits or succeeds. This can slightly reduce the amount of error at step 2.2.2 due to the contention on the index.

ewernli 2010-03-12 13:13:37

thanks, great answer - one more thing, can i detect that unique constraint exception in sql? or do i have to let it bubble back to be detected in java?

MalcomTucker 2010-03-12 16:10:45

oh, and a another question - am i right in thinking that mysql will cache query results, so in theory if a name has already been inserted and returned once, i should be able to leverage the cache to return names that exist?

MalcomTucker 2010-03-12 16:15:36

and sorry, one more question - I assume the algo you have outlined above can all be done in sql but do i need to manually manage the table locks, or will mysql do that for me? thanks for the advice :)

MalcomTucker 2010-03-12 16:16:55

I used to check the type of exception with JDBC. It's database specific and you need to check the error code in the exception (SQLException#getErrorCode). I know it's ORA-0001 for Oracle, but don't know the code for MySQL. Just produce the error and you will get it. Though I never did it, I guess you can also check the error in a stored procedure, if that's what you mean (See http://www.devshed.com/c/a/MySQL/Error-Handling/2/).

ewernli 2010-03-12 16:26:12

For the other questions, I don't know the detail of MySQL caching, but that should be transparent to you. The locks are managed by the database if you have the proper index.

ewernli 2010-03-12 16:27:44

ok - if the sql is throwing errors back to java and then i'm having to re-insert is that not going to be really inefficient? what if i used an SP and explicitly took locks, so something like - readlock, select where name = ?, if null, writelock, insert, release locks ..

MalcomTucker 2010-03-12 16:37:25

You don't re-insert rows. If the insert fails, it means the row has been inserted concurrently so you only need to read it (step 2.2.2.1). You can not lock on something that does not exist, so you can't prevent altogether that two inserts will happen concurrently and one will fail. This error handling implies a bit of overhead. What you can try to do is to read the row (step 1) with a lock, so that if the row has already been insert concurrently *but was not yet committed*, you wait until it's committed and don't generate an error.

ewernli 2010-03-13 09:17:15

Performance of this approach in practice depends on the load and the distribution of data. Make some tests to tune the number of threads.

ewernli 2010-03-13 09:21:49

ansaurus

tags:

views:

answers:

How do I ensure data consistency in this concurrent situation?

related questions