views:

469

answers:

4

i am working on a SSIS solution for datawarehouse for extracting Surrogate keys of corresponding application keys, I am using look up task of SSIS but the problem with this task is it caches the complete look up table in its memory . And my look up table size is huge i.e. 20 million records. So if u can suggest some ways or alternatives for look up task

A: 

Do a LEFT JOIN with your Lookup data when you bring the data into the SSIS package and then evaluate what you need to.

If the lookup table is in a different source, then you can do a LEFT JOIN in SSIS, but that is going to cache rows as well. I think that the JOIN may be marginally faster than a Lookup.

Raj More
but the problem is look up table is in different source and caching the table run time for such large table dosent seems to be feesable option
paranjai
A: 

Do you have to scan the whole table? I.e. can you specify your lookup as a database View on the table, or even specify it as the results of a SQL Query (Use results of an SQL Query option)

James Wiseman
+1  A: 

I do not consider a table with 20 million records too huge for look up. You can do some filtering and by selecting only the required columns in the lookup you can optimize it for using small amount of memory.

For example if you have a key column of type int and a varchar column of size 10 needed for look up, a record will take 4+10bytes and 20million goes to 20Mx(4+10) ~= 280MB which cannot be considered as too high.

Still if you want to reduce memory usage, you will have to use joins.

Faiz
A: 

Make sure that you pick only columns that you need in the look-up table, do not cache columns which are not needed. Find some time to take a look at MS "Project "Real" which uses SSIS in high data-volume applications and discusses best practices.

Damir Sudarevic