tags:

views:

924

answers:

7

I have a table with more than a millons rows. This table is use to index Tiff Images. Each image have fields like date, number, etc. I have users that index this images in batch of 500. I need to know if it is better to insert first 500 and then 500 updates or when the user ends do the 500 insert with all the data. A very important thing is that if I do the 500 inserts at first this time is free for me because I can do it the night before.

So the question is: Is better to do Inserts or Updates and why? I have define a Id value for each image, I have other index on the fields.

A: 

I'm not a database guy, but I imagine doing the inserts in one shot would be faster because the updates require a lookup whereas the inserts do not.

Giovanni Galbo
+3  A: 

Giovanni, it will also depend on other issues such as indexing (clustered or non-clustered) and fill factor. Your specific situation will contribute largely on how you proceed.

Galwegian
+2  A: 

Doing the inserts first and then the updates does seem to be a better idea for several reasons. You will be inserting at a time of low transaction volume. Since inserts have more data, this is a better time to do it.

Since you are using an id value (which is presumably indexed) for updates, the overhead of updates will be very low. You would also have less data during your updates.

You could also turn off transactions at the batch (500 inserts/updates) level and use it for each individual record, thus reducing some overhead.

Finally, test this out to see the actual performance on your server before making a final decision.

Krishna Kumar
+1  A: 

The execution plan for each query will tell you which one should be more expensive. The real limiting factor will be the writes to disk, so you may need to run some tests while running perfmon to see which query causes more writes and causes the disk queue to get the longest (longer is bad).

Booji Boy
+1  A: 

This isn't a cut and dry question. Krishna's and Galegian's points are spot on.

For updates, the impact will be lessened if the updates are affecting fixed-length fields. If updating varchar or blob fields, you may add a cost of page splits during update when the new value surpasses the length of the old value.

spoulson
+1  A: 

I think inserts will run faster. They do not require a lookup (when you do an update you are basically doing the equivalent of a select with the where clause). And also, an insert won't lock the rows the way an update will, so it won't interfere with any selects that are happening against the table at the same time.

Eric Z Beard
+5  A: 

Updates in Sql server result in ghosted rows - i.e. Sql crosses one row out and puts a new one in. The crossed out row is deleted later.

Both inserts and updates can cause page-splits in this way, they both effectively 'add' data, it's just that updates flag the old stuff out first.

On top of this updates need to look up the row first, which for lots of data can take longer than the update.

Inserts will just about always be quicker, especially if they are either in order or if the underlying table doesn't have a clustered index.

When inserting larger amounts of data into a table look at the current indexes - they can take a while to change and build. Adding values in the middle of an index is always slower.

You can think of it like appending to an address book: Mr Z can just be added to the last page, while you'll have to find space in the middle for Mr M.

Keith