tags:

views:

2713

answers:

7

I'm trying to get rid of some spurious warnings in my SSIS Progress log. I'm getting a bunch of warnings about unused columns in tasks that use raw SQL to do their work. I have a Data Flow responsible for archiving data in a staging table prior to loading new data. The Data Flow looks like this:

+--------------------+
| OLEDB Source task: |
| read staging table |
+--------------------+
          |
          |
+---------------------------+
| OLEDB Command task:       |
| upsert into history table |
+---------------------------+
          |
          |
+---------------------------+
| OLEDB Command task:       |
| delete from staging table |
+---------------------------+

my 'upsert' task is something like:

--------------------------------------
-- update existing rows first...
update history
set    field1 = s.field1
    ...
from   history h
inner  join staging s
on     h.id = s.id
where  h.last_edit_date <> s.last_edit_date -- only update changed records

-- ... then insert new rows
insert into history
select s.*
from   staging s
join   history h
on     h.id = s.id
where  h.id is null
--------------------------------------

The cleanup task is also a SQL command:

--------------------------------------
delete from staging
--------------------------------------

Since the upsert task doesn't have any output column definitions, I'm getting a bunch of warnings in the log:

[DTS.Pipeline] Warning: The output column "product_id" (693) on output 
"OLE DB Source Output" (692) and component "read Piv_product staging table" (681) 
is not subsequently used in the Data Flow task. Removing this unused output column 
can increase Data Flow task performance.

How can I eliminate the references to those columns? I've tried dropping in a few different tasks, but none of them seem to let me 'swallow' the input columns and suppress them from the task's output. I'd like to keep my logs clean so I only see real problems. Any ideas?

Thanks!

A: 

Union All - select only the columns you want to pass through - delete any others.

I thought they were going to address this in the 2008 version to allow columns to be trimmed/suppressed from the pipeline.

Cade Roux
A: 

Thanks for the quick tip! Unfortunately, it doesn't work.

I inserted a Union All between the 1st & 2nd tasks & set all columns to Ignore, but then couldn't OK the editor. Since I refer to the columns explicitly in the SQL in the Upsert (2nd) task, I don't refer to the output columns of the 1st task, so I don't need them -- no columns should get passed thru to the output..

Since the SQL in the 2nd task does all the work, I tried to simply remove the OLEDB Source task altogether. That gets rid of the warnings -- but the staging table never gets moved to history, so without the OLEDB Source task the Data Flow doesn't do anything.

I tried replacing the column names in the SET clause with parameter placeholders, like

update history
set    ? = s.field1

but the OLEDB command task has no way to assign parameters to the placeholders, so that's a no-go, as well.

I should have noted, I'm using SSIS 2005, not 2008....

Val
A: 

OK, I got a workaround on the MSDN forums:

use a Script Component transformation between task 1 and 2; select all the input columns; leave the script body empty.

That consumes the columns, the job processes properly and no warnings are logged.

Still not clear why I need the OLEDB Source at all, since the SQL in task 2 connects to the source tables and does all the work, but when I remove the OLEDB Source the dataflow runs but doesn't process any rows, so the staging table never gets emptied, and then the downstream process to put changed rows in the staging table fails because of PK violations. But that's a problem for another day. This is a bit clunky, but my logs are clean.

Val
A: 

Hmm, I'm sure I'm not using StackOverflow correctly, but I can't mark my own answer immediately above as an answer. I put my followups as Answers, not comments on an answer, because the comments can't contain enough text. Plus, I did provide my own answer!

Anyway, consider this topic closed, unless someone else has a better way to suppress the warnings or get rid of the OLDEB Source task.

Val
A: 

Looking at your problem again, I think you are using SSIS "against the grain". I'm not really following what you are reading out of the staging table, since your upsert doesn't seem to depend on anything in a particular row, nor does the cleanup.

It would seem to me that the cleanup would run once for every row, but that doesn't really make sense.

DataFlows are not typically used to perform bulk actions for each row coming down the pipeline. If you are using a pipeline UPSERTs get handled using Lookup (or third-party TableDifference) components and then a spli in the pipeline to an OLEDB Destination (BULK INSERT) and either an OLEDB Command (once per UPDATE) or another OLDEB Destination for an "UPDATE staging table".

Normally, I would do this with a DataFlow to load the staging table without any split, then a single Execute SQL Task in the control flow to perform everything else in straight SQL UPSERT (like you have) by calling an SP.

OLEDBCommand is useful if you DON'T want to have a staging table, and instead read a flat file and want to execute an UPDATE or INSERT using a Lookup component or something. But it will be called for every row in the pipeline.

Cade Roux
A: 

Your dataflow task should finish with the "upsert". Then back in the control flow create an "Execute SQL Task" for the delete from staging. Link your dataflow task to your exec sql.

I don't use a 3rd party tool for my upserts, but do as Cade suggests, which is to split your dataflow into new records that just head to a OLE DB Destination (or similar), and update records that can go to your oledb command for updates. You can split the flow using a merge-join or a lookup.

AdamH
Adam, thanks -- you're exactly right, and a day or so after I posted this I finally came to that realization. Cade's right, too I - was using SSIS against the grain. I moved the DELETE code to the Control Flow after the Data Flow completes, and all's happy now. Thanks!
Val
+1  A: 

The warnings in your pipeline are caused by columns being selected in your data source that aren't being used in any subsequent Tasks.

The easy fix to this is double click on your data source. In your case (OLEDB Source task: | | read staging table) Then click on columns and deselect any columns that you don't need in any of your future task items.

This will remove those warnings from your progress log.

However reading your item above and as explained by other answers you aren't using the columns from the Source Task in the subsequent items so it can simply be removed.

Dale Wright

related questions