views:

209

answers:

4

I have a cron job that updates a large number of rows in a database. Some of the rows are new and therefore inserted and some are updates of existing ones and therefore update.

I use insert on duplicate key update for the whole data and get it done in one call.

But- I actually know which rows are new and which are updated so I can also do inserts and updates seperately.

Will seperating the inserts and updates have advantage in terms of performance? What are the mechanics behind this ?

Thanks!

A: 

It depends on which storage engine your using, MyISAM is very good at selects and inserts because it can do them concurrently, but it locks the whole table when writing so is not so good for updates. How about you try benchmarking it, and finding out which method takes longer?

Rob
A: 

From a performance standpoint the difference is in number of statements – for in memory data sets going over network and parsing query is what is taking most of time, this is why having it in single statement helps to improve performance. Since you know which need to be inserted vs updated, I do not believe you will see any performance difference. If the Update uses a WHERE statement in which the ID of the record to be updated is indexed, you should see no performance difference.

Gary
+2  A: 

You say

I actually know which rows are new and which are updated so I can also do inserts and updates seperately.

If you know without hitting the database which are INSERTs and which are UPDATEs, then running the correct statement has to be quicker than doing an INSERT ... ON DUPLICATE KEY ...

The INSERTs will be no quicker; the UPDATEs will be quicker because you don't have to attempt an INSERT first.

gdt
A: 

Are you using individual statements for each record? You might want to look at the load data infile for a bulk update. We gained some performance last time (been a year) when I tried that.

gsharma