views:

390

answers:

0

I have a centralized table that is populated with row counts and process-level information for every stored procedure on my server. What is the best method for logging rows in a data flow in SSIS and logging start and end times for individual data flows?

The only method I have found for logging data flow statistics is to log OnProgress events to a SQL Server table using database logging, run a query that creates a table-valued parameter, scrapes the log for the rows, and then inserts the values into my centralized performance metrics table via a stored procedure that accepts table-valued parameters. Here's the code that scrapes the log:

SET NOCOUNT ON;

-- Declare constant values.
DECLARE @RECENT_CHANGE_DATE DATETIME = CAST(CONVERT(CHAR(10), DATEADD(DAY, -7, GETDATE()), 111) AS DATETIME);
DECLARE @EXECUTIONID VARCHAR(100) = REPLACE(REPLACE(CAST(? AS VARCHAR(100)), '{', ''), '}', '');
DECLARE @PROCESS VARCHAR(100) = ?;
DECLARE @SCHEMA VARCHAR(100) = 'schema of objects that I monitor'
DECLARE @DATA_FLOW_PROCESS VARCHAR(150) = 'Common string at the beginning of every data flow name ' + @SCHEMA;
DECLARE @LENGTH INT = LEN(@DATA_FLOW_PROCESS)


-- Declare the performance counters tvp.
DECLARE @performance_counters_tvp
AS performance_counters_type;


-- Insert values into the tvp.
INSERT INTO @performance_counters_tvp 
 ( target_server, target_database, target_schema, target_table
 , process, step_id, step, start_date, end_date, duration_seconds
 , rows, rows_per_second, error_code)
SELECT 'target_server' = @@SERVERNAME
 , 'target_database' = 'target_database_name'
 , 'target_schema' = @SCHEMA
 , 'target_table' = SUBSTRING(source, @LENGTH + 2, LEN(source) - @LENGTH + 1)
 , 'process' = @PROCESS
 , 'step_id' = 0
 , 'step' = CAST(executionid AS VARCHAR(100))
 , 'start_date' = MIN(starttime)
 , 'end_date' = MAX(endtime)
 , 'duration_seconds' = DATEDIFF(SECOND, MIN(starttime), MAX(endtime))
 , 'rows' = MAX(CASE 
  WHEN message like '%"component "%wrote%' 
   THEN SUBSTRING(message, CHARINDEX('wrote ', message) + 6, CHARINDEX('rows.', message) - CHARINDEX('wrote ', message) - 7)
  ELSE NULL END)
 , 'rows_per_second' = ISNULL(MAX(CASE 
  WHEN message like '%"component "%wrote%' 
   THEN SUBSTRING(message, CHARINDEX('wrote ', message) + 6, CHARINDEX('rows.', message) - CHARINDEX('wrote ', message) - 7)
  ELSE NULL END), 0) / ISNULL(NULLIF(DATEDIFF(SECOND, MIN(starttime), MAX(endtime)), 0), 1)
 , 'error_code' = @@ERROR
FROM [some_database].[some_schema].[ssis_event_log] ss
WHERE ss.executionid = @EXECUTIONID
AND ss.source <> @PROCESS
AND ss.source LIKE @DATA_FLOW_PROCESS + '%'
AND ss.starttime >= @RECENT_CHANGE_DATE
GROUP BY executionid, source;

-- Insert values in the tvp into the target table using a tvp stored procedure.
EXEC [some_schema].usp_performance_counters_add_tvp @performance_counters_tvp;

Here's the code that populates my centralized log:

CREATE PROCEDURE [some_schema].[usp_performance_counters_add_tvp]
    @performance_counters_tvp performance_counters_type READONLY
AS

SET NOCOUNT ON;

INSERT INTO [some_schema].performance_counters
    ( target_server, target_database, target_schema, target_table, process, step_id
    , step, start_date, end_date, duration_seconds, rows, rows_per_second, error_code
    , inserted_rows, updated_rows, deleted_rows)  
SELECT target_server, target_database, target_schema, target_table, process, step_id
    , step, start_date, end_date, duration_seconds, rows, rows_per_second, error_code
    , rows, 0, 0
FROM @performance_counters_tvp;

There are a few downsides to this approach. Here's the short list: 1. The records are logged in one call after all the data flows have executed. This means that the logs are out of sequence in my centralized performance counters table. 2. I usually have to wrap the whole package in a sequence container so that I can add this as the last step to run after all data flows have completed. This is irritating from a design perspective, but bearable. 3. The SSIS event log scraper code requires my team to following a specific naming convention for data flows. If someone makes one minor error, then the logging is corrupted. These kinds of errors are hard to find and usually are only identified when we're trying to analyze performance of a data flow and cannot locate the performance metrics. 4. Querying the ssis events log in every ssis package is slow. Despite giving it performance hints and additional indexes, in some packages this process takes longer than it should. It is really silly when it takes longer to scrape the ssis events log to find out how well a process performed than it took to actually complete the data flows in the package.

I do not believe adding a Row Count transformation in the middle of a data flow and logging the value is a valid solution. Logging Row Count values in the middle of a data flow is not the same data quality as logging how many rows were actually written to a destination table. I only want to log final row counts, not in-transit row counts.

One of my team members has a theory that we could somehow use an Event Handler to log the data. This seems appealing since there is probably a post-execute Event Handler that could be aware of the row count. However, if this is your proposed solution, then please provide an example of code that could be used in an Event Handler to log the rows.

related questions