views:

30

answers:

2

This is quite a strange problem, wasn't quite sure how to title it. The issue I have is some data rows in an SSIS task which need to be modified depending on other rows.

Name  Location  IsMultiple
Bob   England
Jim   Wales
John  Scotland
Jane  England

A simplifed dataset, with some names, their locations, and a column 'IsMultiple' which needs to be updated to show which rows share locations. (Bob and Jane's rows would be flagged 'True' in the example above).

In my situation there is much more complex logic involved, so solutions using sql would not be suitable.

My initial thoughts were to use an asyncronous script task, take in all the data rows, parse them, and then output them all after the very last row has been input. The only way I could think of doing this was to call row creation in the PostExecute Phase, which did not work.

Is there a better way to go about this?

+1  A: 

If possible, I might recommend doing it with pure SQL in a SQL task on your control flow prior to your data flow. A simple UPDATE query where you GROUP BY location and do a HAVING COUNT for everything greater than 1 should be able to do this. But if this is a simplified version this may not be feasible.

If the data isn't available until after the data flow is done you could place the SQL task after your data flow on your control flow.

Daniel DiPaolo
+3  A: 

A couple of options come to mind for SSIS solutions. With both options you would need the data sorted by location. If you can do this in your SQL source, that would be best. Otherwise, you have the Sort component.

With sorted data as your input you can use a Script component that compares the values of adjacent rows to see if multiple locations exist.

Another option would be to split your data path into two. Do this by adding a Multicast component. The first path would be your main path that you currently have. In the second task, add an Aggregate transformation after the Multicast component. Edit the Aggregate and select Location as a Group By operation. Select (*) as a Count all. The output will be rows with counts by location.

After the Aggregate, Add a Merge Join component and select your first and second data paths as inputs. Your join keys should be the Location column from each path. All the inputs from path 1 should be outputs and include the count from path 2 as an output.

In a derived column, modify the isMultiple column with an expression that expresses "If count is greater than 1 then true else false".

bobs

related questions