ansaurus

Question

Answer 1

+2 A:

Well, an obvious candidate would be an index on the columns you use to do your matching in the MERGE statement - do you have an index on (ColumnA, ColumnB, ColumnC, ColumnD, ColumnE, ColumnF) on your destitation table??

This tuple of columns is being used to determine whether or not a row from your source table already exists in the database. If you don't have that index nor any other usable index in place, you get a table scan on the large destination table for each row in your source table, basically.

If not: I would try to add it and then see how the runtime behavior changes. Does the MERGE now run a little less than a very, very long time??

marc_s 2010-07-28 19:35:25

Experimenting a little more, I made an additional column in the desttable varchar(42) 'HashCol', made a unique index on it, then added the col to the temp table with this:Update #temptableSet HashCol = (SELECT SUBSTRING(master.dbo.fn_varbintohexstr( HASHBYTES('Sha1' , Cast([ColumnA] as VarChar(16))+ ColumnB+ ColumnC+ ColumnD+ ColumnE+ ColumnF) ),3,32 ) )then in the proc I indexed the HashCol and then merge on that.I haven't tried to make a composite index yet because I am not sure what the optimal order is for the columns, IIRC the most unique cols should be first.

ScSub 2010-07-29 19:10:59

I have decided to stick with adding the HashCol because I found an edge case where ColumnB can be null which raised an error which was easy to see in the script. I also just found out that the columns can change and it's easier to change 'what matters' with the HashCol than it is to drop and recreate the composite index.Thanks all for the suggestions!

ScSub 2010-08-04 15:19:05

Answer 2

+1 A:

My suggestion is if you only need to run it once, then Merge statement is acceptable if time is not that critical. But, if you're going to use the script more often, I think it'll be better if you do it step by step instead of using the Merge statement. Step by step, like creating your own select, insert, update, delete statements in order to attain the goal. With this you'll have more control almost on everything(query optimization, indexing, etc...)

In your case, probably separating the 6 where criteria might be more efficient than combining them all at once. Downside is you'll have longer script.

koderoid 2010-07-28 19:51:11

Why roll your own?? I highly doubt you'll manage to be faster than the MERGE statement...... plus: the MERGE statement is one transactional unit - if you roll your own, you have to do a lot of basic bookkeeping and administration yourself....

marc_s 2010-07-28 20:25:50

exactly, it'll be faster to break down one transactional unit with a huge set of data to a multiple transaction unit with small sets of data.like on his sample, if he's going to process 26,000 or million rows.would it be better to have 1 criteria per set:SET_A = where Table1.ColA = Table2.ColASET_B = SET_A where Table1.ColB = Table2.ColB...than 1 huge set all criteria:Result=where Table1.ColA = Table2.ColA and Table1.ColB = Table2.ColB and ...Regardless of Indexing at this point. Regarding Adminisitration, Bookkeeping? it's just one script, I don't see any issue with that.

koderoid 2010-07-28 22:39:47

@koderoid - Those 6 columns are the joining criteria - I can't see how you propose they can be split up.

Martin Smith 2010-07-28 23:06:22

@TempTABLE1 = from (physical table/s) where Table1.ColA = Table2.ColA@TempTABLE1 = from @TempTABLE1 where Table1.ColB = Table2.ColBand so on.. It can also be @TempTABLE1, @TempTABLE2...

koderoid 2010-07-28 23:28:53

ansaurus

tags:

views:

answers:

SQL Server Merge and Indexing Speed

related questions