ansaurus

Question

Integration services and identity columns

Answer 1

A:

OK, this is a good news / bad news situation I'm afraid. First the good news and a bit of background which you may know but I'll put it down in case you don't.

You generally can't insert anything into IDENTITY columns. Of course, like everything else in life there are times when you need to and that can be done with the IDENTITY_INSERT option.

SET IDENTITY_INSERT MyTable ON

INSERT INTO MyTable (
       MyIdCol,
       Etc…
       ) 
SELECT SourceIdCol,
       Etc…
  FROM MySourceTable

    SET IDENTITY_INSERT MyTable OFF

Now, you say that you have surrogate keys in the target but then you say that they may clash. So I'm a little confused… Are you using the keys from the source (e.g. IDENTITY columns) or are you generating new keys in the target? I would strongly advise against trying to merge the keyspaces in a single key column. If you need to retain the keys then I would suggest a multi-field key using something like SourceSystemId to keep them unique.

Finally the bad news: SSIS doesn't provide a simple means of using the IDENTITY_INSERT option. The only way I've been able to do it is by turning it on in a SQL task that executes before the insert task. You should be able to pass the table name into the script as a variable. Make sure to include another SQL task afterwards to turn it off because you can only use on one table at a time.

Joe Harris 2009-07-16 15:57:28

@joe harris - I think OLEDB Destination > Fast Load Options > FastLoadKeepIdentity automates the Identity_Insert setting http://msdn.microsoft.com/en-us/library/ms141237.aspx

onupdatecascade 2009-07-23 16:24:15

Answer 2

A:

Chris KL - you are correct that this is harder than one would expect. I have three methods for this, which work in different situations:

IF the data you are loading is small (hundreds or thousands but not hundreds OF thousands) then you can do this: use an OLEDB command that performs one insert for each parent row and returns the identity value back; then downstream from that join the output from that to the child rows, and insert them. Advantage: intuitive. Disadvantage: scales badly. This method is documented on the web and should Google for you.
If we are talking about a bigger system where you need bulk loading, then there are two other flavors:

a. If you have exclusive access to the table during the load (really exclusive, enforced in some way) then you can grab the max existing ID from the table, use an SSIS script task to number the rows starting above that max id, then Set Identity Insert On, stuff them in, and Set Identity Insert Off. You then have those script-generated keys in SSIS to assign to the child rows. Advantage: fast and simple, one trip to the DB. Disadvantage: possible errors if some other process inserts into your table at the same time. Brittle.

b. If you don't have exclusive access, then the only way I know of is with a round trip to the DB, thus: Insert all parent rows but keep track of a key for them that is not the identity column (a business key, for example). In a second dataflow, process the child records by using a Lookup transform that uses the business key to fetch the parent ID. Make sure the lookup is tuned appropriately vs. caching, and that thee business key is indexed.

onupdatecascade 2009-07-23 02:01:16

ansaurus

tags:

views:

answers:

Integration services and identity columns

related questions