views:

89

answers:

2

I have two tables, one has about 1500 records and the other has about 300000 child records. About a 1:200 ratio. I stage the parent table to a staging table, SomeParentTable_Staging, and then I stage all of it's child records, but I only want the ones that are related to the records I staged in the parent table. So I use the below query to perform this staging by joining with the parent tables staged data.

--Stage child records
INSERT INTO [dbo].[SomeChildTable_Staging]
           ([SomeChildTableId]
           ,[SomeParentTableId]
           ,SomeData1
           ,SomeData2
           ,SomeData3
           ,SomeData4    
        )
    SELECT [SomeChildTableId]
        ,D.[SomeParentTableId]
           ,SomeData1
           ,SomeData2
           ,SomeData3
           ,SomeData4    
  FROM [dbo].[SomeChildTable] D
INNER JOIN dbo.SomeParentTable_Staging I ON D.SomeParentTableID = I.SomeParentTableID;

The execution plan indicates that the tables are being joined with a Nested Loop. When I run just the select portion of the query without the insert, the join is performed with Hash Match. So the select statement is the same, but in the context of an insert it uses the slower nested loop. I have added non-clustered index on the D.SomeParentTableID so that there is an index on both sides of the join. I.SomeParentTableID is a primary key with clustered index.

Why does it use a nested loop for inserts that use a join? Is there a way to improve the performance of the join for the insert?

+2  A: 

A few thoughts:

  1. Make sure your statistics are up to date. Bad statistics account for many of the bizarre "intermittent" query plan problems.

  2. Make sure your indexes are covering, otherwise there's a much higher probability of the optimizer ignoring them.

  3. If none of that helps, you can always force a specific join by writing INNER HASH JOIN as opposed to just INNER JOIN.

Aaronaught
+1 for the third thought (first two don't make a whole lot of sense to me tho)
Andomar
@Andomar: Sometimes even a *slight* difference in the parse tree (and an `INSERT` tacked on at the end counts) can result in a totally different plan if the optimizer thinks it's crossing a threshold. Anyway, I'm not 100% sure that either of those will work, but can't hurt to try!
Aaronaught
@Aaronaught: Exactly. Of the 3 options only an explicit `hash join` is a permanent fix :)
Andomar
@Andomar: Yeah, but it's a hacky fix. I use join hints only as a last resort!
Aaronaught
@Aaronaught I was able to use `HASH JOIN` to force the hash match so that I at least could test the difference in performance and see if it was worth my time investigating further.
AaronLS
+1  A: 

Does the destination table have a clustered index? The choice of join may be necessary to facilitate the ordering of the data in the insert. I've seen execution plans differ depending on whether the destination table has a clustered index and what column(s) it is on.

Cade Roux
Yes it does. This makes sense, because if I force the hash match with `hash join` then there are two extra steps added to the query to do a clustered index insert.
AaronLS
@AaronLS - Yeah, that's exactly the kind of case I've seen. So does the insert with the explicit hash join perform quicker than the insert without, given the extra steps needed to accommodate the clustered index when you force the hash join?
Cade Roux
@Cade The `hash join` is slightly faster at about 18 seconds and without it is 20 seconds. So it is not significant at this point. Later on I might test out using a non-clustered primary key and run the entire staging processes and see if overall the speed increase outweighs the decrease, since there would be other queries effected by the change.
AaronLS