tags:

views:

468

answers:

9

Often I want to add a value to a table or update the value if its key already exists. This can be accomplished in several ways, assuming a primary or unique key is set on the 'user_id' and 'pref_key' columns in the example:

1. Blind insert, update if receiving a duplicate key error:

// Try to insert as a new value
INSERT INTO my_prefs 
(user_id, pref_key, pref_value)
VALUES (1234, 'show_help', 'true');

// If a duplicate-key error occurs run an update query
UPDATE my_prefs 
SET pref_value = 'true'
WHERE user_id=1234 AND pref_key='show_help';

2. Check for existence, then select or update:

// Check for existence
SELECT COUNT(*) 
FROM my_prefs
WHERE user_id=1234 AND pref_key='show_help';

// If count is zero, insert
INSERT INTO my_prefs 
(user_id, pref_key, pref_value) 
VALUES (1234, 'show_help', 'true');

// If count is one, update
UPDATE my_prefs 
SET pref_value = 'true' 
WHERE user_id=1234 AND pref_key='show_help';

The first way seems to be preferable as it will require only one query for new inserts and two for an update, where as the second way will always require two queries. Is there anything I'm missing though that would make it a bad idea to blindly insert?

A: 

The first way is the preferred way as far as I know.

J.J.
+4  A: 

Personally I am never a fan of exception based programming (expecting an exception in the normal operation of an application) and to me the second example is much more readable/maintainable.

There are situations where this would make a difference (very tight loops for example) but I think there should be a good reason to write code like this rather than it being the default.

Dave Verwer
Maybe in Ruby, but it's not true in a database. The overhead of the check in a table is not like the overhead of checking for an object in memory. You are misapplying standards for your language to a database. Database people have the opposite standard for a reason, it performs much better.
This is not uncommon. The more I learn about SQL server, the more I realize that my standard for my Oracle code would be a horrible idea for SQL Server code. Just because it applies in one place doesn't mean it's a good idea everywhere.
A: 

In your DAO model you could have an id field.

  • If set to null / -1 / whatever, the data hasn't been persisted.

  • When you persist it (or retrieve from database), set it to the id value in the database.

  • Your persist method can check the ID and pass it onto the update() or add() implementation.

  • Flaws: Getting out of sync with the database, etc. I'm sure there are more, but I really should get some work done...

JeeBee
+7  A: 

There is the third MySQL way, which would be the preferred one in that RDBMS

INSERT INTO my_prefs 
(user_id, pref_key, pref_value) 
VALUES (1234, 'show_help', 'true')
ON DUPLICATE KEY 
UPDATE pref_value = 'true'
Vinko Vrsalovic
Wow! I didn't know that existed!
Rich Bradshaw
+2  A: 

You may be able to use REPLACE instead, or if using a more current MySQL you get the option of using "INSERT ... ON DUPLICATE KEY UPDATE"

The fact that several people brought this up in quick succession says "always check the MySQL docs" when you have an issue, as they're decent and in many cases, leads directly to the solution.

Paul Kroll
+12  A: 

have a look at the ON DUPLICATE KEY syntax in http://dev.mysql.com/doc/refman/5.0/en/insert-select.html

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name [(col_name,...)]
SELECT ...
[ ON DUPLICATE KEY UPDATE col_name=expr, ... ]
Ken
That is freaking awesome! I'm going to go trim down about 2 pages of code right now.
Mark Biek
I didn't know that; we're still using MySQL 4 (because upgrading is not trivial) which doesn't do it. It does have, and we use, REPLACE which does a similar thing but is less flexible.
Mark Baker
A: 

So long as you're using MySQL you can use the "ON DUPLICATE" keyword. For example:

INSERT INTO my_prefs (user_id, pref_key, pref_value) VALUES (1234, 'show_help', 'true') ON DUPLICATE KEY UPDATE (pref_key, pref_value) VALUES ('show_help', 'true');

Eric Lamb
+3  A: 

If you want to avoid "the exception" by perhaps inserting a doublette and you want to use standard SQL (and your programming language / database returns the count of the updated rows) then use the following "SQL" - commands (pseudo-code):

int i = SQL("UPDATE my_prefs ...");
if(i==0) {
    SQL("INSERT INTO my_prefs ...");
}

This also takes in account that - for the most use cases - updates do occur more often than inserts.

Georgi
Why do updates occur more often than inserts? I bet people on SO insert new answers far more often then they edit them.
+2  A: 

Will there be concurrent INSERTs to these rows? DELETEs?

"ON DUPLICATE" sounds great (the behavior is just what you want) provided that you're not concerned about portability to non-MySQL databases.

The "blind insert" seems reasonable and robust provided that rows are never deleted. (If the INSERT case fails because the row exists, the UPDATE afterward should succeed because the row still exists. But this assumption is false if rows are deleted - you'd need retry logic then.) On other databases without "ON DUPLICATE", you might consider an optimization if you find latency to be bad: you could avoid a database round trip in the already-exists case by putting this logic in a stored procedure.

The "check for existence" is tricky to get right if there are concurrent INSERTs. Rows could be added between your SELECT and your UPDATE. Transactions won't even really help - I think even at isolation level "serializable", you'll see "could not serialize access due to concurrent update" errors occasionally (or whatever the MySQL equivalent error message is). You'll need retry logic, so I'd say the person above who suggests using this method to avoid "exception-based programming" is wrong, as is the person who suggests doing the UPDATE first for the same reason.

Scott Lamb