views:

695

answers:

3

SSIS - I have data loaded and various transformations on the data complete, the problem is there is a parent/child relationship managed in the data - best explained by an example

each row has (column names are made up) row_key parent_row_key row_name parent_row_name

some rows have row_key == parent_row_key (their own parent) some rows relate to another row (row 25 is the parent to row 44 for example).

In this case, row 25 is parent to row 44. I need to put row 25's row_name in row 44's parent_row_name. How do I query the data in the pipeline for the value?

thanks.

+3  A: 

Can you not just split the data using a multicast and then do a merge-join against itself?

piers7
certainly an avenue worth trying - the hardest part of the product is learning how to use it to solve problems.
WIDBA
Needless to say it'd be easier to get whatever system you're querying to do it for you before you bring it into SSIS, but I guess you wouldn't be asking if that were appropriate in your case...
piers7
correct - I may have to persist the data and then do the analysis via some simple Sql Tasks -> at least till I get the hang of the product more.
WIDBA
+1 - I do find though that the sorting required for merging can cause serious slowdown in larger datasets (Anything over 100k rows). Sorting at the datasource (IE: Your query has ORDER BY) helps tremendously.
Meff
A: 

You could write your data to a temp table in your database, a raw file destination or recordset destination (depending on the size of your dataset). Then you could run through you data again and query your temp data and find the correct parent.

Emil Rasmussen
A: 

Sounds like the way Macola (Exact Software) handles transactions. In their situation, a transaction is added to a table and it includes the reference to the original transaction in the row - for example, if the original transaction was a purchase in january and there is a payment in febrary, the february payment will include the transaction number from the january payment in a column refering to the parent. Since there is no foreign key relationship, each initial transaction is given a guid that can move between ledgers across the system.