views:

52

answers:

5

I am using an OLEDB source to call a stored proc to grab a set of rows each marked with either I U or D by doing a UNION between three SELECT statements in a stored procedure.

A is a table populated with data brought over from a legacy system every night. B is our system's data we want to keep updated with the legacy system's data.

SET NOCOUNT ON;

-- entries in A that dont exist in B need to be inserted into B
SELECT 
    A.ID,
    B.ID,
    A.data1,
    A.data2,
    A.data3,
    A.data4,
    F1.ID as Fkey1,
    F2.ID as Fkey2,
    'I' as InsertUpdateDeactivateFlag
FROM A
LEFT JOIN B ON A.val = B.val
LEFT JOIN F1 ON A.f1val = F1.val
LEFT JOIN F2 ON A.f2val = F2.val
WHERE B.ID is null

UNION ALL

-- entries in A that do exist in B that have different field values
SELECT
    A.ID,
    B.ID,
    A.data1,
    A.data2,
    A.data3,
    A.data4,
    F1.ID as Fkey1,
    F2.ID as Fkey2,
    'U' as InsertUpdateDeactivateFlag
FROM A
INNER JOIN B ON A.val = B.val
LEFT JOIN F1 ON A.f1val = F1.val
LEFT JOIN F2 ON A.f2val = F2.val
WHERE
    A.data1 <> B.data1 OR
    A.data2 <> B.data2 OR
    A.data3 <> B.data3 OR
    A.data4 <> B.data4 OR
    F1.ID <> B.Fkey1 OR
    F2.ID <> B.Fkey2

UNION ALL

-- entries in B that dont exist in A should have Active set to 0
SELECT
    A.ID,
    B.ID,
    B.data1,
    B.data2,
    null as data3, -- dont need this value for deactivates
    null as data4, -- dont need this value for deactivates
    B.Fkey1,
    B.Fkey2,
    'D' as InsertUpdateDeactivateFlag
FROM A
RIGHT JOIN B ON A.val = B.val
WHERE A.ID is null

Then I am doing a conditional split using that flag to direct rows to one of three OLEDB Commands that executes stored procedures for each row (inserting, updating, or deactivating). These three procedures write to table B.

If there are only a few dozen rows going through my SSIS package it works fine. But when we wipe out table B and repopulate (a few hundred thousand inserts) the package hangs. It just hangs, no error message, no failure. The OLEDB source has read in all but the last few rows (approx 700 to go, same number every time), feeds rows through the rest of the flow and then just hangs. My coworker says it actually did finish after 75 minutes or so.

The stored proc runs flawlessly in SSMS and returns the entire result set in a few seconds every time. Never had any issues except in SSIS.

Now the weird part is that if I move the third SELECT statement (the deactivates) from being the last result set to union to being the first result set of the union, everything works fine and the package executes in around 3 minutes! Huh?!

Also, if I don't do a join on the deactivates, just return all rows from one of the tables, it works fine also. Why would SSIS care about this join or how would it even know? I've also tried wrapping the UNIONs in an outer SELECT to no avail.

We have checked SQL Server Profiler and the queries are executing fine, no locks, etc. It is my understanding that SSIS has already grabbed all the data from the OLEDB source before it starts to put rows into the pipeline. So concurrency issues due to writing to the same table I am selecting from on the SQL Server should not be an issue since the read has been complete since before the package starts it's processing (please correct me if this is wrong).

It's as if SSIS is analyzing the stored proc incorrectly when the UNIONs are in a certain order. Is this the case? Has anybody run in to similar situations? Or can somebody shed some light on this process to see where I should be looking for the issue?

In the actual stored proc there are CASE statements in the SELECT items, ISNULL()'s on the WHERE clause items, and other UDF calls in WHERE clause items and SELECT items, just in case that matters.

Thanks.

+1  A: 

The few issues I've had where a stored proc runs fine within SSMS and performs terribly outside it have been caused by parameter sniffing.

tQuarella
This particular proc doesn't take any parameters. Very informational link though.
Steve W
A: 

Sounds like you looked at most everything I'd recommend. At times like this, I generally blame grelmins, establish a work-around (such as your making the third query the first), and move on.

Have you checked what locks are present in the system during that 75 minute period? This might indicate if data is still being pulled while SSIS is trying to "jump ahead" on the subsequent tasks.

Depending on keys and cardinality, changing the third query to a "not exists" form might be more efficient--though that doesn't explain current behavior. (Too, that right outer join make me nervous--you could try it as a left outer join, like the others.)

-- entries in B that dont exist in A should have Active set to 0 
SELECT 
    A.ID, 
    B.ID, 
    B.data1, 
    B.data2, 
    null as data3, -- dont need this value for deactivates 
    null as data4, -- dont need this value for deactivates 
    B.Fkey1, 
    B.Fkey2, 
    'D' as InsertUpdateDeactivateFlag 
FROM B
WHERE NOT EXISTS (SELECT 1 FROM A WHERE val = B.val)
Philip Kelley
Unfortunately no locks detected and using table hints to deny lock/force lock have no effect.
Steve W
+1  A: 

I would try two things:

(1) In the properties window for your OLE DB Component making this SQL statement, set ValidateExternalMetadata to False (long shot but try it).

(2) Make 3 OLEDB Component sources and use the SSIS Union Component. I think SSIS wants you to do this and frankly, I usually use the component rather than trying to get UNION's to work within one OLEDB Component source.

ajdams
I will try this setting. If I have to rewrite the procs and package I may go that route, thanks.
Steve W
Agree with second point!
Philip Kelley
A: 

Since SSIS may start to process rows as they are received, my guess is that one of the deactivate rows can't be processed because one of the other statements in the union is already grabbing that row (or creating it).

Have you tried ordering the rows in the UNION? Remember, without an ORDER BY, changing the order of the statements will probably change the order in which the rows are received and thus processed.

Tom H.
This is interesting. I will try this and see what happens.
Steve W
Throwing the ORDER BY, no matter what order (asc/desc), worked. Adding the ORDER BY increases the time of the execution plan and perhaps that last SELECT is done before the inserts start. I am going to look into the implications of this particular join (the deactivates) and why it may hang when inserting at the same time.
Steve W
A: 

Are you doing significant work in SSIS?

Couldn't you bring the data in in parallel and UNION in SSIS? I know you have UNION ALL, but the SQL Server isn't really special to perform that. Also, why UNION if you're just going to split again?

Seems like you could just make three separate table-valued UDFs and simply:

INSERT INTO B
SELECT *
FROM udf_INSERTS()

UPDATE B
SET whatever
FROM B
INNER JOIN udf_UPDATES() AS u
    ON u.key = B.key

DELETE
FROM B
WHERE B.key IN (SELECT key FROM udf_DELETES())

All in straight SQL.

Calling SPs for individual inserts/updates is just not very fast in SSIS. So typically, I like to stream to a table and then use a SQL statement to perform the set operation.

In your case, it's not so clear that SSIS is doing a lot of work it's good at (pivots, sorts, aggregates, lookups, heterogenous data, scripts, validation) (you are unioning outside of SSIS, then splitting in SSIS and calling individual SPs for the resulting rows), so I'm just throwing that out there.

Cade Roux