views:

1092

answers:

3

I’m using SSIS to synchronize data between two databases. I’ve used SSIS and DTS in the past, but I generally write an application for things of this nature (I’m coder and it just comes easier to me).

In my package I use a SQL Task that returns about 15,000 rows. I’ve hooked that up to a Foreach Container, and within that I assign the resultset column values to variables, and then map those variables to parameters that are fed to another SQL Task.

The problem I’m having is with debugging, and not just more complicated debugging like breakpoints and evaluating values at runtime. I simply mean that if I run this with debugging rather than without, it takes hours to complete. I ended up rewriting the process in Delphi, and the following is what I came up with:

Full Push of Data:
This pulls 15,000 rows, updates a destination table for each row, then pulls 11,000 rows and updates a destination table for each row.

Debugging:
Delphi App: 139s
SSIS: 4 hours, 46 minutes

Not Debugging:
Delphi App: 132s
SSIS: 384s

Update of Data:
This pulls 3,000 rows, but no updates are needed or made to the destination table. It then pulls 11,000 rows but, again, no updates are needed or made to the destination table.

Debugging:
Delphi App: 42s
SSIS: 1 hours, 10 minutes

Not Debugging:
Delphi App: 34s
SSIS: 205s

The odd thing is, I get the feeling that most of this time spent debugging is just updating UI elements in Visual Studio. If I watch the progress tab, a node is added to a tree for each iteration (thousands total), and this gets slower and slower as the process goes on. Trying to stop debugging usually doesn’t work, as Visual Studio seems caught in a loop updating the UI. If I check the profiler for SQL Server no actual work is being done. I'm not sure if the machine matters, but it should be more than up to the job (quad core, 4 gig of ram, 512 mb video card).

Is this sort of behavior normal? As I’ve said I’m a coder by trade, so I have no problem writing an app for this sort of thing (in fact it takes much less time for me to code an application than “draw” it in SSIS, but I figure that margin will shrink with more work done in SSIS), but I’m trying to figure out where something like SSIS and DTS would fit into my toolbox. So far nothing about it has really impressed me. Maybe I’m misusing or abusing SSIS in some way?

Any help would be greatly appreciated, thanks in advance!

A: 

I have noticed this is the behavior, I had an SSIS package for moves, that did somewhere in the neighborhood of 3 million entries, it was not possible to debug as it would run for about 3-4 days.

SSIS is still the way I did it, I just don't "debug" with SSIS, I run them when working with the full datasets. If I must debug, I use very small datasets.

Mitchel Sellers
+3  A: 

SSIS control flow and loops are not very high performance, and not designed for processing these amounts of data. Especially during the debugging - before and after each task execution, debugger sends notifications to designer process, which updates colors of the shapes and this could be slow.

You could get much better performance using data flow. Data flow does not operate with single rows, it works with buffers of rows - much faster, and the debugger is only notified about beginning/end of the buffers - so its impact is less noticeable.

Michael
+2  A: 

SSIS is not designed to do a foreach like that. If you are doing something for each row coming in, you probably want to read those into a dataflow and then using a lookup or merge join, determine whether to do an INSERT (these happen in bulk) or a database command object for multiple SQL UPDATE commands (a better performing option is to batch these into staging table and do a single UPDATE).

In another typical sync situation, you read all the data into a staging table, and do a SQL Server UPDATE on the existing rows (INNER JOIN) and INSERT on the new rows (LEFT JOIN, rhs IS NULL). There is also the possibility of using linked servers, but joins over that can be slow, since all (or a lot of) the data may have to come across the network.

I have SSIS packages that regular import 24 million rows, including handling data conversion and validation and slowly changing dimensions using the TableDifference component, and it performs relatively quickly for that large amount of data versus a separate client program.

Cade Roux