tags:

views:

476

answers:

1

Hello

I have a ETL type requirement for SQL Server 2005. I am new to SSIS but I believe that it will be the right tool for the job.

The project is related to a loyalty card reward system. Each month partners in the scheme send one or more XML files detailing the qualiifying transactions from the previous month. Each XML file can contain up to 10,000 records. The format of the XML is very simple, 4 "header" elements, then a repeating sequence containing the record elements. The key record elements are card_number, partner_id and points_awarded.

The process is currently running in production but it was developed as a c# app which runs an insert for each record individually. It is very slow, taking over 8 hours to process a 10,000 record file. Through using SSIS I am hoping to improve performance and maintainability.

What I need to do:

  1. Collect the file
  2. Validate against XSD
  3. Business Rule Validation on the records. For each record I need to ensure that a valid partner_id and card_number have been supplied. To do this I need to execute a lookup against the partner and card tables. Any "bad" records should be stripped out and written to a response XML file. This is the same format as the request XML, with the addition of an error_code element. The "good" records need to be imported into a single table.

I have points 1 and 2 working ok. I have also created an XSLT to transform the XML into a flat format ready for insert. For point 3 I had started down the road of using a ForEach Loop Container control in the control flow surface, to loop each XML node, and the SQL Lookup task. However, this would require a call to the database for each lookup and a call to the file system to write out the XML files for the "bad" and "good" records.

I believe that better performance could be achieved by using the Lookup control on the data flow surface. Unfortunately, I have no experience of working with the data flow surface.

Does anyone have a suggestion as to the best way to solve the problem? I searched the web for examples of SSIS packages that do something similar to what I need but found none - are there any out there?

Thanks

Rob.

A: 

SSIS is frequently used to load data warehouses, so your requirement is nothing new. Take a look at this question/answer, to get you started with tutorials etc.

For-each in control flow is used to loop through files in directory, tables in a db etc. Data flow is where records fly through transformations from a source (your xml file) to a destination (tables).

You do need a lookup in one of its many flavors. Google for "ssis loading data warehouse dimensions"; this will eventually show you several techniques of efficiently using lookup transformation.

To flatten the XML (if simple enough), I would simply use XML source in data flow, XML task is for heavier stuff.

Damir Sudarevic

related questions