views:

44

answers:

2

I'm trying to use an SQL insert statement to migrate rows from a table in one database to a table in a different database. The statement works until I add a unique index on the destination table and at that point I'm struggling to get the insert statement to be able to exclude the duplicates. Here's what I though should work:

INSERT INTO [MyDB].[dbo].[HPB] (
    [HPID],
    [BusinessID]
)
SELECT 
    PersonId = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId),
    lBusinessId
FROM [MyOriginalDB].[dbo].[tblEmployment]
WHERE
    lPersonId in (SELECT PersonID FROM [MyDB].[dbo].[HP])
AND
    lBusinessId in (SELECT ID FROM [MyDB].[dbo].[Business])
AND 
   NOT EXISTS (SELECT * FROM [MyDB].[dbo].[HPB] WHERE 
     [HPID] = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId) 
     AND [BusinessID] = lBusinessId)

The schema for the HPB table is:

CREATE TABLE [dbo].[HPB](
   [ID] [int] IDENTITY(1,1) NOT NULL,
   [HPID] [int] NOT NULL,
   [BusinessID] [int] NOT NULL,
   CONSTRAINT [PK_HealthProfessionalBusiness] PRIMARY KEY CLUSTERED)

The unique index is on the [MyDB].[dbo].[HPB] table for columns (HPID, BusinessID)

When I run the insert I get an error about duplicate row inserts and I can't work out why the SQL below doesn't exclude the duplicates.

 NOT EXISTS (SELECT * FROM [MyDB].[dbo].[HPB] WHERE 
     [HPID] = (SELECT ID FROM [MyDB].[dbo].[HP] WHERE PersonID = lPersonId) 
     AND [BusinessID] = lBusinessId)
+1  A: 

Use:

INSERT INTO [MyDB].[dbo].[HPB] 
  ([HPID], [BusinessID])
SELECT DISTINCT
       h.id,
       e.lbusinessid
  FROM [MyOriginalDB].[dbo].[tblEmployment] e
  JOIN [MyDB].[dbo].[HP] h ON h.personid = e.lpersonid
 WHERE e.lbusinessid in (SELECT ID FROM [MyDB].[dbo].[Business])
   AND NOT EXISTS (SELECT NULL 
                     FROM [MyDB].[dbo].[HPB] hb
                    WHERE hb.businessid = e.lbusinessid
                      AND hb.hpid = h.id)
OMG Ponies
+2  A: 
Insert MyDB.dbo.HPB( HPID, BusinessID )  
Select HP.ID, E.IBusinessID  
From [MyOriginalDB].[dbo].[tblEmployment] As E
    Join [MyDB].[dbo].[HP] As HP
        On HP.PersonId = E.IPersonID
    Join [MyDB].[dbo].[Business] As B
        On B.ID = E.IBusinessID
    Left Join [MyDB].[dbo].[HPB] As HPB
        On HPB.BusinessID = E.IBusinessID
            And HPB.PersonID = E.IPersonId
Where HPB.ID Is Null
Group By HP.ID, E.IBusinessID
Thomas
+1: Nicely done!
OMG Ponies
Mind though that LEFT JOIN/IS NULL is only faster on MySQL - on SQL Server, NOT IN and NOT EXISTS are more efficient: http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/
OMG Ponies
@OMG Ponies - I did not know that about the perf difference of Left Join vs. Exists. I suppose I've always been in the habit of testing the execution plan of each in order to see which is more efficient. Granted this is pushing my memory, but I seem recall some situations, on SQL Server, where a Left Join and null was faster than an Exists and others where the reverse was true. Perhaps, it was merely perception.
Thomas
sipwiz
@sipwiz - You need to place that additional column in an aggregate function. So something like `Min(E.OtherCol)` or `Max(E.OtherCol)`
Thomas
Absolutely fantastic, thanks!
sipwiz