tags:

views:

90

answers:

1

I'm using SQL Server 2005 to develop an SSIS package. My dataflow contains a SSIS lookup. Each of the rows I'm processing has these columns (among others): ID1, detailsID1, ID2, detailsID2. The value of column detailsID1 should be loaded from a lookup based on ID1, and the value for detailsID1 from another lookup based on ID2. The thing is that these two lookups are identical: I make the same select for the same columns on the same table (ID1 and ID2 are some IDs of records containted by one table). Is there any way that I can use only one lookup to load both details columns? I don't want to use 2 lookups because that table is huge.

Do you have any ideea about how could I do this? Any idea is greatly appreciated!

A: 

The Lookup component can be a pain for large tables. You can try a couple of options.

First, use a SQL statement that joins the lookup table to the source data. This would require multiple data flow tasks. Your existing data flow would output to a destination staging table in the database. Then you can create a new data flow where the source is the a query that joins your lookup table once for each lookup to the staging table. An example of the query might be:

SELECT lt1.detailsID1, lt2.detailsID2, <other StagingTable columns>
FROM StagingTable st
LEFT OUTER JOIN LookupTable lt1 ON st.ID1 = lt1.ID1
LEFT OUTER JOIN LookupTable lt2 ON st.ID2 = lt2.ID2

The lookups are completed as part of the SQL query.

Another option is to do the join process using data flow components. Note, that this one will use a lot of memory on the computer where the SSIS package runs.

Using your current data flow add a new source that gets the data from the lookup table. Add a Multicast component to the data flow and create a data flow path from the source output to the Multicast input.

Add a Merge Join component and add the output from the primary source as the left input and add the first Multicast output as the right input. Note that the inputs must already be sorted in the sequence of the join key (ID1 for the first join and ID2 for the second join -- see below). Configure the Merge Join as a left outer join with match on ID1 columns. Include the DetailID1 column in the output

Add a Sort component and a second Merge Join. The output from the first Merge join goes to the Sort component. Set the sort key to the ID2 column. The Sort output goes to the second Merge Join component as the Left join input. The second Multicast output goes to the second Merge Join input. Configure the second Merge Join the same as the first Merge Join, but for the ID2 column.

alt text

As you may see, this process is a lot more work. The first solution will probably be your best solution, but you should compare the solutions for best performance.

bobs
Thank you so much for your answer. I was somehow hoping for a more straight forward approach :). Since I need it to be fast and I need to make it fast (less than 1 day), I'll go for 2 lookups with caching enabled - not the best option, but it's an option. Thanks again for your answer!
Diana
Here is a case from Todd McDermid's blog for the multiple lookups. http://toddmcdermid.blogspot.com/2010/09/parallelism-in-ssis-multiple-lookups.html
CTKeane
Nice article, thank you for bringing it into my attention.
Diana