views:

1089

answers:

5

How do you stop race conditions in MySQL? the problem at hand is caused by a simple algorithm:

  1. select a row from table
  2. if it doesn't exist, insert it

and then either you get a duplicate row, or if you prevent it via unique/primary keys, an error.

Now normally I'd think transactions help here, but because the row doesn't exist, the transaction don't actually help (or am I missing something?).

LOCK TABLE sounds like an overkill, especially if the table is updated multiple times per second.

The only other solution I can think of is GET_LOCK() for every different id, but isn't there a better way? Are there no scalability issues here as well? And also, doing it for every table sounds a bit unnatural, as it sounds like a very common problem in high-concurrency databases to me.

+5  A: 

what you want is LOCK TABLES

or if that seems excessive how about INSERT IGNORE with a check that the row was actually inserted.

If you use the IGNORE keyword, errors that occur while executing the INSERT statement are treated as warnings instead.

Ken
+2  A: 

It seems to me you should have a unique index on your id column, so a repeated insert would trigger an error instead of being blindingly accepted again.

That can be done by defining the id as a primary key or using a unique index by itself.

I think the first question you need to ask is why do you have many threads doing the exact SAME work? Why would they have to insert the exact same row?

After that being answered, I think that just ignoring the errors will be the most performant solution, but measure both approaches (GET_LOCK v/s ignore errors) and see for yourself.

There is no other way that I know of. Why do you want to avoid errors? You still have to code for the case when another type of error occurs.

As staticsan says transactions do help but, as they usually are implied, if two inserts are ran by different threads, they will both be inside an implied transactions and see consistent views of the database.

Vinko Vrsalovic
Well yes of course we've got unique indeces etc... in fact that's what made us realise the problem exists is errors triggered by unique index.
tpk
That's how it's supposed to work, you prepare for transactions to fail... in this case it seems very easy: If dup key error, ignore because the row already exists. A full table/row lock may be more of a performance hit than just ignoring errors when they occur. Measure though
Vinko Vrsalovic
+2  A: 

On a technical level, a transaction will help here because other threads won't see the new row until you commit the transaction.

But in practice that doesn't solve the problem - it only moves it. Your application now needs to check whether the commit fails and decide what to do. I would normally have it rollback what you did, and restart the transaction because now the row will be visible. This is how transaction-based programmer is supposed to work.

staticsan
It's worth noting that transactions (depending on the level of isolation and so on) also involve locking, better to leave the locking to the database infrastructure than to calling it yourself.
Vinko Vrsalovic
A: 

I ran into the same problem and searched the Net for a moment :)

Finally I came up with solution similar to method to creating filesystem objects in shared (temporary) directories to securely open temporary files:

$exists = $success = false;
do{
 $exists = check();// select a row in the table 
 if (!$exists)
  $success = create_record();
  if ($success){
   $exists = true;
  }else if ($success != ERROR_DUP_ROW){
    log_error("failed to create row not 'coz DUP_ROW!");
    break;
  }else{
    //probably other process has already created the record,
    //so try check again if exists
  }
}while(!$exists)

Don't be afraid of busy-loop - normally it will execute once or twice.

xvga
A: 

Locking the entire table is indeed overkill. To get the effect that you want, you need something that the litterature calls "predicate locks". No one has ever seen those except printed on the paper that academic studies are published on. The next best thing are locks on the "access paths" to the data (in some DBMS's : "page locks").

Some non-SQL systems allow you to do both (1) and (2) in one single statement, more or less meaning the potential race conditions arising from your OS suspending your execution thread right between (1) and (2), are entirely eliminated.

Nevertheless, in the absence of predicate locks such systems will still need to resort to some kind of locking scheme, and the finer the "granularity" (/"scope") of the locks it takes, the better for concurrency.

(And to conclude : some DBMS's - especially the ones you don't have to pay for - do indeed offer no finer lock granularity than "the entire table".)

Erwin Smout