views:

105

answers:

3

The problem is this:

  • I have multiple competing threads (100+) that need to access one database table
  • Each thread will pass a String name - where that name exists in the table, the database should return the id for the row, where the name doesn't already exist, the name should be inserted and the id returned.
  • There can only ever be one instance of name in the database - ie. name must be unique

How do I ensure that thread one doesn't insert name1 at the same time as thread two also tries to insert name1? In other words, how do I guarantee the uniqueness of name in a concurrent environment? This also needs to be as efficient as possible - this has the potential to be a serious bottleneck.

I am using MySQL and Java.

Thanks

+3  A: 

Create a unique constraint on name column in database.

Tadeusz Kopec
In practice, it's slightly more complicated...
ewernli
+2  A: 

Add a unique constraint for the name column.

lajuette
Plus correctly detect the unique constraint issue from the Exception. This can be non-trivial depending on the info the JDBC driver returns in the Exception. Spring does this in its JDBC module so sneak a peek at that code if you get stuck.
Mike Q
In practice, it's slightly more complicated...
ewernli
+5  A: 

Assuming there is a unique constraint on the name column, each insert will acquire a lock. Any thread that attempts to insert it a second time concurrently will wait until the 1st insert either succeeds or fails (tx commit or rolls back).

If the 1st transaction succeeds, 2nd transaction will fail with with a unique key violation. Then you know it exists already.

If there is one insert per transaction, it'ok. If there are more than 1 insert per transaction, you may deadlock.

Each thread will pass a String name - where that name exists in the table, the database should return the id for the row, where the name doesn't already exist, the name should be inserted and the id returned.

So all in all, the algo is like this:

1 read row with name
   2.1 if found, return row id
   2.2 if not found, attempt to insert
      2.2.1 if insert succeeds, return new row id
      2.2.2 if insert fails with unique constraint violation
          2.2.2.1 read row with name
          2.2.2.2 read should succeed this time, so return row id

Because there can be a high contention on the unique index, the insert may block for some time. In which case the transaction may time out. Make some stress test, and tune the configuration until it works correctly with your load.

Also, you should check if you get a unique constraint violation exception or some other exception.

And again, this works only if there is one insert per transaction, otherwise it may deadlock.


Also, you can try to read the row at step 1 with "select * for update". In this case, it waits until a concurrent insert either commits or succeeds. This can slightly reduce the amount of error at step 2.2.2 due to the contention on the index.

ewernli
thanks, great answer - one more thing, can i detect that unique constraint exception in sql? or do i have to let it bubble back to be detected in java?
MalcomTucker
oh, and a another question - am i right in thinking that mysql will cache query results, so in theory if a name has already been inserted and returned once, i should be able to leverage the cache to return names that exist?
MalcomTucker
and sorry, one more question - I assume the algo you have outlined above can all be done in sql but do i need to manually manage the table locks, or will mysql do that for me? thanks for the advice :)
MalcomTucker
I used to check the type of exception with JDBC. It's database specific and you need to check the error code in the exception (SQLException#getErrorCode). I know it's ORA-0001 for Oracle, but don't know the code for MySQL. Just produce the error and you will get it. Though I never did it, I guess you can also check the error in a stored procedure, if that's what you mean (See http://www.devshed.com/c/a/MySQL/Error-Handling/2/).
ewernli
For the other questions, I don't know the detail of MySQL caching, but that should be transparent to you. The locks are managed by the database if you have the proper index.
ewernli
ok - if the sql is throwing errors back to java and then i'm having to re-insert is that not going to be really inefficient? what if i used an SP and explicitly took locks, so something like - readlock, select where name = ?, if null, writelock, insert, release locks ..
MalcomTucker
You don't re-insert rows. If the insert fails, it means the row has been inserted concurrently so you only need to read it (step 2.2.2.1). You can not lock on something that does not exist, so you can't prevent altogether that two inserts will happen concurrently and one will fail. This error handling implies a bit of overhead. What you can try to do is to read the row (step 1) with a lock, so that if the row has already been insert concurrently *but was not yet committed*, you wait until it's committed and don't generate an error.
ewernli
Performance of this approach in practice depends on the load and the distribution of data. Make some tests to tune the number of threads.
ewernli