views:

428

answers:

7

This is more of a correctness question. Say I have a table with a primary key column in my database. In my DAO code I have a function called insertRow(string key) that will return true if the key doesn't exist in the table and insert a new row with the key. Otherwise, if a row already exists with that key it returns false. Is it better/worse to have insertRow first check for the existence of the key or just go ahead and do the insert and catch the duplicate key error? Or is saving on a single select statement too trivial an optimization to even bother worrying about?

So in sudo code:

boolean insertRow(String key){
    //potentially a select + insert
    if(select count(*) from mytable where key = "somekey" == 0){
       insert into mytable values("somekey")
       return true;
    }
    return false;
}

or

  boolean insertRow(String key){
    try{
       //always just 1 insert
       insert into mytable values("somekey")
       return true;
    } catch (DuplicateKeyException ex){}
    return false;
  }
+2  A: 

Second one because first option hits twice the db while second one just once.

Macarse
A: 

The short answer is that you need to test it for yourself. My gut feeling is that doing a small select to check for the existence will perform better, but you need to verify that for yourself at volume and see whichever performs better.

In general, I don't like to leave my error checking entirely to the exception engine of whatever it is I'm doing. In other words, if I can check to see if what I'm doing is valid rather than just having an exception thrown, that's generally what I do.

I would suggest, however, using an EXISTS query rather than count(*)

if(exists (select 1 from mytable where key = "somekey"))
    return false
else
    insert the row

All that being said (from an abstract, engine-neutral perspective), I'm pretty sure that MySQL has some keywords that can be used to insert a row into a table only if the primary key doesn't exist. This may be your best bet, assuming you're OK with using MySQL-specific keywords.

Another option would be to place the logic entirely in the SQL statement.

Adam Robinson
A: 

another two options in mysql are to use

insert ignore into....

and

insert into .... on duplicate key update field=value

including on duplicate key update field=field

See: http://dev.mysql.com/doc/refman/5.0/en/insert.html

Edit: You can test affected_rows for whether or not the insert had an effect or not.

Jonathan Fingland
Similar statements exists also on Oracle and SQL Server 2008: MERGE.
Remus Rusanu
+5  A: 

Insert the row, catch the duplicate key error. My personal choice

I reckon this might perform better, depending on the cost of throwing the exception against the cost of hitting the db twice.

Only by testing both scenarios wilil you know for sure

Stuart
+1  A: 

In my opinion, this is an excellent case for using exceptions (since the duplicate is exceptional), unless you're counting on there to, most of the time, be a row already (i.e., you're doing "insert, but update if exists" logic.)

If the purpose of the code is to update, then you should either use the select or an INSERT ... ON DUPLICATE KEY UPDATE clause (if supported by your database engine.) Alternatively, make a stored procedure that handles this logic for you.

Blixt
Good point, most of the time there will not be a key collision.
Adam B
A: 

Now that I've found Martin Fowler's book online, a decent way to do it is with a key table- see pg 222 for more info.

RichardOD
+2  A: 

Try the insert, then catch the error.

Otherwise, you could still have a concurrency issue between two active SPIDs (lets say two web users on the system at the same time), in which case, you'd have to catch the error anyway:

User1: Check for key "newkey"? Not in database.
User2: Check for key "newkey"? Not in database.
User1: Insert key "newkey". Success.
User2: Insert key "newkey". Duplicate Key Error.

You can mitigate this by using explicit transactions or setting the transaction-isolation level, but its just easier to use the second technique, unless you are sure only one application thread is running against the database at all times.

BradC