views:

37

answers:

2

I am developing SSIS packages that consist of 2 main steps:

Step 1: Grab all sorts of data from existing legacy systems and dump them into a series of staging tables in my database.

Step 2: Move the data from my staging tables into a more relational set of tables that I'm using specifically for my project.


In step 1 I'm just doing a bulk SELECT and a bulk INSERT; however, in step 2 I'm doing row-by-row inserts into my tables using OLEDB Command tasks so that I can log very specific row-level activity of everything that's happening. Here is my general layout for step 2 processes.

alt text

You'll notice 3 OLEDB tasks: 1 for the actual INSERT, and 2 for success/fail INSERTs into our logging table.

The main thing I'm logging is source table/id and destination table/id for each row that passes through this flow. I'm storing this stuff in variables and adding them to the data flow using a Derived Column so that I can easily map them to the query parameters of the stored procedures.

alt text

I've decided to store these logging values in variables instead of hard-coding the values in the SqlCommand field on the task, because I'm pretty sure you CAN'T put variable expressions in that field (i.e. exec storedproc @[User::VariableName],... ,... ,...). So, this is the best solution I've found.

alt text

Is this the best solution? Probably not.

Is it good performance wise to add 4 logging columns to a data flow that consists of 500,000 records? Probably not.

Can you think of a better way?

+1  A: 

I really don't think calling an OLEDBCommand 500,000 times is going to be performant.

If you are already going to staging tables - load it all to a staging table and take it from there in T-SQL or even another dataflow (or to a raw file and then something else depending on your complete operation). A Bulk insert is going to be hugely more efficient.

Cade Roux
This solution calls for SSIS, so a pure T-SQL based approach won't work. Care to elaborate?
Phil Scholtes
@Phil Scholtes I'm not sure what else to say, calling an OLECDCommand 500,000 times is going to perform horribly. You need to stream to a table for performance. You can always add derived columns to hold your variables (typically we put the metadata in a batch table and then a batch id in the stream).
Cade Roux
A: 

to add to Cade's answer if you truly need the logging info on a row by row basis, your best best is to leverage the oledb destination and use one or both of the following transformations to add columns to the dataflow:

Derived Column Transformation

Audit Transformation

This should be your best bet and should't add much overhead

JasonHorner
Will either of these work for using custom fields like table name or message?
Phil Scholtes
The derived column can be any field you want. you would simply map it to a matching column in your table
JasonHorner