views:

698

answers:

2

I am a bit of an SSIS newbie and while the whole system seems straightforward, I don't conceptually understand the process I need to go through in this scenario:

  1. Need to map Invoice and InvoiceLine tables from a source database to two equivalent tables in a destination database - with different identity values.

  2. For each invoice inserted across, I need to get the identity it was assigned and then insert all its lines referencing that new identity

  3. There is a surrogate key on the invoices (the invoice number), however these might also clash with invoice numbers in the target system, hence they would also have to be renumbered.

This must be a common scenario in integration - is there a common solution?

A: 

OK, this is a good news / bad news situation I'm afraid. First the good news and a bit of background which you may know but I'll put it down in case you don't.

You generally can't insert anything into IDENTITY columns. Of course, like everything else in life there are times when you need to and that can be done with the IDENTITY_INSERT option.

SET IDENTITY_INSERT MyTable ON

INSERT INTO MyTable (
       MyIdCol,
       Etc…
       ) 
SELECT SourceIdCol,
       Etc…
  FROM MySourceTable

    SET IDENTITY_INSERT MyTable OFF

Now, you say that you have surrogate keys in the target but then you say that they may clash. So I'm a little confused… Are you using the keys from the source (e.g. IDENTITY columns) or are you generating new keys in the target? I would strongly advise against trying to merge the keyspaces in a single key column. If you need to retain the keys then I would suggest a multi-field key using something like SourceSystemId to keep them unique.

Finally the bad news: SSIS doesn't provide a simple means of using the IDENTITY_INSERT option. The only way I've been able to do it is by turning it on in a SQL task that executes before the insert task. You should be able to pass the table name into the script as a variable. Make sure to include another SQL task afterwards to turn it off because you can only use on one table at a time.

Joe Harris
@joe harris - I think OLEDB Destination > Fast Load Options > FastLoadKeepIdentity automates the Identity_Insert setting http://msdn.microsoft.com/en-us/library/ms141237.aspx
onupdatecascade
A: 

Chris KL - you are correct that this is harder than one would expect. I have three methods for this, which work in different situations:

  1. IF the data you are loading is small (hundreds or thousands but not hundreds OF thousands) then you can do this: use an OLEDB command that performs one insert for each parent row and returns the identity value back; then downstream from that join the output from that to the child rows, and insert them. Advantage: intuitive. Disadvantage: scales badly. This method is documented on the web and should Google for you.

  2. If we are talking about a bigger system where you need bulk loading, then there are two other flavors:

a. If you have exclusive access to the table during the load (really exclusive, enforced in some way) then you can grab the max existing ID from the table, use an SSIS script task to number the rows starting above that max id, then Set Identity Insert On, stuff them in, and Set Identity Insert Off. You then have those script-generated keys in SSIS to assign to the child rows. Advantage: fast and simple, one trip to the DB. Disadvantage: possible errors if some other process inserts into your table at the same time. Brittle.

b. If you don't have exclusive access, then the only way I know of is with a round trip to the DB, thus: Insert all parent rows but keep track of a key for them that is not the identity column (a business key, for example). In a second dataflow, process the child records by using a Lookup transform that uses the business key to fetch the parent ID. Make sure the lookup is tuned appropriately vs. caching, and that thee business key is indexed.

onupdatecascade