views:

830

answers:

4

By "variable format" I mean that each row in the text file begins with a single character code denoting the record type, and each record type has a different format. I believe this is some sort of fairly common mainframe data export method. What makes matters worse is the fact that the record types are actually hierarchies. The codes in use are related as follows:

0ThisIsAFileHeader
2ThisIsABatchHeader
4ThisIsDetailData
4ThisIsDetailData
6ThisIsAMatchingBatchFooter
8ThisIsAMatchingFileFooter

0/8 are header/trailer records (each of their own format), 2/6 are sub-header/sub-trailer records (also each of their own formats) and 4 indicates the actual data or detail records.

SQL Server 2008, Visual Studio.NET 2008. Is a custom script task the only way to import this data? Given that, could someone point me to a resource that discusses doing so? Importing a fixed-width text file is pretty straightforward, but I'm not sure where the script task would fall in the control flow and how the data would be utilized by subsequent steps.

+3  A: 

This could be done within SSIS although you will have to jump through a few hoops. Off the top of my head this is one approach:

  • Source File should be set as 2 columns, 1st character and a string of the rest.
  • Conditional Split action based on the first character splitting the rows into separate workflows.
  • Data Conversion action to take the 2nd column and split it into the appropriate fields, this would have to be done for each type of line separately and attached to the appropriate conditional split line.
  • Destination should be configured for each conditional branch to store / handle the data appropriately.

If further ETL work is required to start linking items together, I would import the files into temp tables or a permanent staging area and perform more transformations before committing the final results.

The real hassle is that the data is related and how you keep this relation / infer it afterwards. I think if I had to throw it together I would use a script component in the data flow that increased a counter every time it saw a header record, and was output into the stream. That way each record would have an identifying number which could be used to relate them back afterwards.

It's a bit complex but the problem can be worked around.

Andrew
This is exactly what my BI team has done in the past. I would recommend also having any records that don't meet the criteria of the conditional split to go to an undefined record table that's basically a text blob and setup monitoring to notify you when records are in this table. Everyone once in a blue moon data schemas can change for these kinds of imports and you need to be aware of when it happens since you probably won't know until the data shows up in the file.
Registered User
A: 

Maybe SSIS has a better way to do this, but when I had to do this task years ago in DTS, this is what I did.

First I brought the data into staging table that had two columns (Or three if I felt I needed an autogenerated id). One column was for the characters that indicated the type of record and onecolumn had all the rest of the data for the line.

Then I separated out to normalized tables based on the data doing any clean up along the way.

Then I imported to my production tables.

HLGEM
A: 

If I were you and those 6 differnt character codes were the only ones associated with a specific set of data then I would import it using the Flat File Import component and then switch using the Case component on each one of the codes. I am assuming you'd like to maintain the hierarchy so you could split up the data with the Case and insert it however you want afterwards.

Using a Script component to import data is usually my last resort but I do really like to format the data before the actual Flat File Import using script code when the file (like in your case) doesn't import correctly. I believe writing one formatting application and using that to format any file that comes from a system is more useful than writing a custom script import every time.

ajdams
A: 

I've done things like this many times, but I always pre-processed the data to add a line number to each line. After that it was easy enough to join the table to itself using max/min and comparisons on the line number to keep the sections together.

But that's kind of clumsy. If we use a conditional split, can SSIS itself get a line number? Or can we use an incrementing integer key in place of line number, without risk that a conditional would get them out of order?

Wes Groleau