Edit: using SQL Server 2005.
I have a query that has to check whether rows from a legacy database have already been imported into a new database and imports them if they are not already there. Since the legacy database was badly designed, there is no unique id for the rows from the legacy table so I have to use heuristics to decide whether the row has been imported. (I have no control over the legacy database.) The new database has slightly different structure and I have to check several values such as whether create dates match, group number match, etc. to heuristically decide whether the row exists in the new database or not. Not very pretty, but the bad design of the legacy system it has to interface with leaves me little choice.
Anyhow the users of the system started throwing 10x to 100x more data at the system than I designed for, and now the query is running too slow. Can you suggest a way to make it faster? Here is the code, with some redadacted for privacy or to simplify but I think I left the important part:
INSERT INTO [...NewDatabase...]
SELECT [...Bunch of columns...]
FROM [...OldDatabase...] AS t1
WHERE t1.Printed = 0
AND NOT EXISTS(SELECT *
FROM [...New Database...] AS s3
WHERE year(s3.dtDatePrinted) = 1850 --This allows for re-importing rows marked for reprint
AND CAST(t1.[Group] AS int) = CAST(s3.vcGroupNum AS int)
AND RTRIM(t1.Subgroup) = s3.vcSubGroupNum
AND RTRIM(t1.SSN) = s3.vcPrimarySSN
AND RTRIM(t1.FirstName) = s3.vcFirstName
AND RTRIM(t1.LastName) = s3.vcLastName
AND t1.CaptureDate = s3.dtDateCreated)