ansaurus

Question

SSIS Lookup with Lookup Component Vs Script Component.

Answer 1

A:

Lookup is not good for this - with caching and everything, it's just not able to lookup on it's previously set values.

You might be better off passing it to a SQL Command task and having a stored proc do an insert or expire/insert depending on what it finds.

You could also stream them to a table and do it in a batch.

To address your flow and the model it's trying to populate:

To start with, it's always awkward when the order of rows in the input causes behavioral differences - i.e. NK = A, Val = 1 then NK = A, Val = 2 gives different behavior than NK = A, Val = 2 then NK = A, Val = 1. One has to wonder if this is the correct dimensional design. Remember that all dimensional attributes are assigned to dimensional tables based on a pragmatic choice. Ultimately dimensions can be arranged into tables at will - so a different design might make more sense. If things are changing within a single load, that may indicate you need to break up that load to match the grain (not attempting to load 2 days data at one time).

I notice that there is an Effective Date and an End Date in your dimension. Right now this sounds a lot like a property of the dimension behavior (where your 01 and 02 codes are changing on a NK) and not of the facts that this dimension is going to be attached to. This might indicate that it needs to be tracked in a separate factless fact table, say (SK, EffectiveDate, EndDate) - or that it's just not important, because all you care about is a NK, 01, 02 combination attached to a fact, in which case, your natural key is really all of NK, 01, 02.

I recommend going back to your fact table and the incoming feed and the expected usage and looking at it more closely and see if this maybe need to be a separate factless fact table tracking these dimension changes.

Also, if you could post more details, that might help, and I'll see what Kimball's materials have to say about it when I see more of a business case.

Cade Roux 2010-06-05 17:19:10

Thanks Cade, that makes sense. If my approach is as joining two datasources, pivoting and then passing them through script component to work out the source records by pulling other values of the new incoming attributes=> dump into staging table and merge with actual dim. What would be the performance hit using Script component this is scenario.

Nev_Rahd 2010-06-06 01:46:02

@Nev_Rahd Anything done on a row-by-row basis is going to be slower against the DB. I think you really need to look at your model. I've added some more material. The model should match the feed to the extent that how the data is modeled should reflect how the data is behaving in terms of sub-codes "wiggling" over time. I think this is a modeling issue more than a ETL issue, because I don't think when you query the fact tables related to this dimension that you are going to be using Effective Date and End Date because they relate to the code assignments and not to the (current) facts.

Cade Roux 2010-06-06 02:51:29

@Nev_Rahd Because the facts in the feed which are going to be assigned to these dimensions are physically assigned to the dimensions by the SK, so the effective date isn't going to matter when querying the DB.

Cade Roux 2010-06-06 02:53:14

Answer 2

A:

Cade's comments are spot on - but I believe your main issue is the duplicates, period. Does the fact that you have two versions of the same NK in the "source" stream indicate two separate, meaningful versions? Or does only the "last" version matter?

If changes reflected in both versions should be reflected in your dimension table, then I echo Cade's suggestion to divide your processing into batches. You could sort your input by the NK (and time of change), then use a row counting script to enumerate the "versions" of each NK, then process "batches" by version number.

If only the last "version" need be incorporated into the dimension table, I suggest you eliminate duplicates prior to using a Lookup.

Todd McDermid 2010-06-07 23:52:14

ansaurus

tags:

views:

answers:

SSIS Lookup with Lookup Component Vs Script Component.

related questions