views:

38

answers:

1

I'm sure that this is a pretty vague question that is difficult to answer but I would be grateful for any general thoughts on the subject.

Let me give you a quick background.

A decade ago, we used to write data loads reading input flat files from legacy applications and load them into our Datamart. Originally, our load programs were written in VB6 and cursored through the flat file and for each record, performed this general process:

1) Look up the record. If found, update it 2) else insert new record

Then we ended up changing this process to use SQL Server to DTS the flat file in a temp table and then we would perform a massive set base join on the temp table with the target production table, taking the data from the temp table and using it to update the target table. Records that didn't join were inserted.

This is a simplification of the process, but essentially, the process went from an iterative approach to "set based", no longer performing updates 1 record at a time. As a result, we got huge performance gains.

Then we created what was in my opinion a powerful set of shared functions in a DLL to perform common functions/update patterns using this approach. It greatly abstracted the development and really cut down on the development time.

Then Informatica PowerCenter, an ETL tool, came around and mgt wants to standardize on the tool and rewrite the old VB loads that used DTS.

I heard that PowerCenter processes records iteratively, but I know that it does do some optimization tricks, so I am curious how Informatica would perform.

Does anyone have any experience with using DTS or SSIS to be able to make a gut performance predition as to which would generally perform better?

+2  A: 

I joined an organization that used both Informatica PowerCenter 8.1.1. Although I can't speak for general Informatica setups, I can say that at this company Informatica was exceedingly inefficient. The main problem is that Informatica generated some really henious SQL code in the back-end. When I watched what it was doing with profiler and from reviewing the text logs, it generated separate insert, update, and delete statements for each row that needed to be inserted/updated/deleted. Instead of trying to fix the Informatica implementation, I simply replaced it with SSIS 2008.

Another problem I had with Informatica was managing parallelization. In both DTS and SSIS, parallelizing tasks was pretty simple -- don't define precedence constraints and your tasks will run in parallel. In Informatica, you define a starting point and then define the branches for running processes in parallel. I couldn't find a way for it to limit the number of parallel processes unless I explicitly defined them by chaining the worklets or tasks.

In my case, SSIS substantially outperformed Informatica. Our load process with Informatica took about 8-12 hours. Our load process with SSIS and SQL Server Agent Jobs was about 1-2 hours. I am certain had we properly tuned Informatica we could have reduced the load to 3-4 hours, but I still don't think it would have done much better.

Registered User
And of course SSIS comes with SQL Server so you get to use it at no additional charge.
HLGEM

related questions