I am using an OLEDB source to call a stored proc to grab a set of rows each marked with either I
U
or D
by doing a UNION between three SELECT statements in a stored procedure.
A is a table populated with data brought over from a legacy system every night. B is our system's data we want to keep updated with the legacy system's data.
SET NOCOUNT ON;
-- entries in A that dont exist in B need to be inserted into B
SELECT
A.ID,
B.ID,
A.data1,
A.data2,
A.data3,
A.data4,
F1.ID as Fkey1,
F2.ID as Fkey2,
'I' as InsertUpdateDeactivateFlag
FROM A
LEFT JOIN B ON A.val = B.val
LEFT JOIN F1 ON A.f1val = F1.val
LEFT JOIN F2 ON A.f2val = F2.val
WHERE B.ID is null
UNION ALL
-- entries in A that do exist in B that have different field values
SELECT
A.ID,
B.ID,
A.data1,
A.data2,
A.data3,
A.data4,
F1.ID as Fkey1,
F2.ID as Fkey2,
'U' as InsertUpdateDeactivateFlag
FROM A
INNER JOIN B ON A.val = B.val
LEFT JOIN F1 ON A.f1val = F1.val
LEFT JOIN F2 ON A.f2val = F2.val
WHERE
A.data1 <> B.data1 OR
A.data2 <> B.data2 OR
A.data3 <> B.data3 OR
A.data4 <> B.data4 OR
F1.ID <> B.Fkey1 OR
F2.ID <> B.Fkey2
UNION ALL
-- entries in B that dont exist in A should have Active set to 0
SELECT
A.ID,
B.ID,
B.data1,
B.data2,
null as data3, -- dont need this value for deactivates
null as data4, -- dont need this value for deactivates
B.Fkey1,
B.Fkey2,
'D' as InsertUpdateDeactivateFlag
FROM A
RIGHT JOIN B ON A.val = B.val
WHERE A.ID is null
Then I am doing a conditional split using that flag to direct rows to one of three OLEDB Commands that executes stored procedures for each row (inserting, updating, or deactivating). These three procedures write to table B.
If there are only a few dozen rows going through my SSIS package it works fine. But when we wipe out table B and repopulate (a few hundred thousand inserts) the package hangs. It just hangs, no error message, no failure. The OLEDB source has read in all but the last few rows (approx 700 to go, same number every time), feeds rows through the rest of the flow and then just hangs. My coworker says it actually did finish after 75 minutes or so.
The stored proc runs flawlessly in SSMS and returns the entire result set in a few seconds every time. Never had any issues except in SSIS.
Now the weird part is that if I move the third SELECT statement (the deactivates) from being the last result set to union to being the first result set of the union, everything works fine and the package executes in around 3 minutes! Huh?!
Also, if I don't do a join on the deactivates, just return all rows from one of the tables, it works fine also. Why would SSIS care about this join or how would it even know? I've also tried wrapping the UNIONs in an outer SELECT to no avail.
We have checked SQL Server Profiler and the queries are executing fine, no locks, etc. It is my understanding that SSIS has already grabbed all the data from the OLEDB source before it starts to put rows into the pipeline. So concurrency issues due to writing to the same table I am selecting from on the SQL Server should not be an issue since the read has been complete since before the package starts it's processing (please correct me if this is wrong).
It's as if SSIS is analyzing the stored proc incorrectly when the UNIONs are in a certain order. Is this the case? Has anybody run in to similar situations? Or can somebody shed some light on this process to see where I should be looking for the issue?
In the actual stored proc there are CASE statements in the SELECT items, ISNULL()'s on the WHERE clause items, and other UDF calls in WHERE clause items and SELECT items, just in case that matters.
Thanks.