I have two tables, one has about 1500 records and the other has about 300000 child records. About a 1:200 ratio. I stage the parent table to a staging table, SomeParentTable_Staging, and then I stage all of it's child records, but I only want the ones that are related to the records I staged in the parent table. So I use the below query to perform this staging by joining with the parent tables staged data.
--Stage child records
INSERT INTO [dbo].[SomeChildTable_Staging]
([SomeChildTableId]
,[SomeParentTableId]
,SomeData1
,SomeData2
,SomeData3
,SomeData4
)
SELECT [SomeChildTableId]
,D.[SomeParentTableId]
,SomeData1
,SomeData2
,SomeData3
,SomeData4
FROM [dbo].[SomeChildTable] D
INNER JOIN dbo.SomeParentTable_Staging I ON D.SomeParentTableID = I.SomeParentTableID;
The execution plan indicates that the tables are being joined with a Nested Loop. When I run just the select portion of the query without the insert, the join is performed with Hash Match. So the select statement is the same, but in the context of an insert it uses the slower nested loop. I have added non-clustered index on the D.SomeParentTableID so that there is an index on both sides of the join. I.SomeParentTableID is a primary key with clustered index.
Why does it use a nested loop for inserts that use a join? Is there a way to improve the performance of the join for the insert?