views:

97

answers:

2

We have a process that is getting data in real time and adding records to a database. We're using SQL Server 2008 Integration Services to run our Extract Transform Load (ETL) process. We download about 50 files from an FTP site, process them and then archive the files.

The problem is that the processing is taking about 17s per file even thought the files are really small (about 10 lines) and the processing code is fairly simple. Looking at the load on the machine it is CPU bound and there are not a lot of traffic on the network, disc, or memory.

I suspect that SSIS might be re-compiling the C# code every time it is run. Has anyone run into similar problems? Or have you used a similar process without problems?

Are there any tools that can allow us to profile a dtsx package?

+3  A: 

Since you're using SSIS 2008, your Script Tasks are always precompiled.

Nissan Fan
That's very helpful. Thanks. Do you know of any way to profile the process and find where most of the time is being spent?
scurial
Here's some common tips here that will let you enable logging across your package to get an idea of the cost of various operations. Without knowing what your Script Tasks do it's hard to say what may be the problem. http://msdn.microsoft.com/en-us/library/ms141031.aspx
Nissan Fan
+1  A: 

Are you sure it's the script task in the first place?

I had some extensive script tasks which built many dictionaries, saw if an incoming value was in various dictionaries according to crazy complex business logic and did a translation or other work. Buy building the dictionaries once in the task initialization instead of on the each row method, the processing improved vastly, as you might expect. But this was a very special case.

The package components will be validated (either at the beginning or right before each control flow component is run), that's some overhead you can't get away from.

Are you processing all the files in a single loop within SSIS? In that case, the data flow validation shouldn't be repeated.

Cade Roux
The files are being processed in a single loop inside of the control flow. The data flow contains no looping but does contain 3 steps that run c# scripts. Should I move the loop out of the control flow?
scurial
@scurial is DelayValidation set on your dataflow inside the loop? I think it should only validate once, but inside the loop, I'm not sure. Is it possible to combine the 3 scripts? How complex are the scripts - is a lot of data initialized in the row processing method which should be better initialized outside the row method.
Cade Roux