views:

1131

answers:

6

If I can do the required ETL requirements using stored procedures, any advantages of using SSIS packages instead? My ETL stuff is nothing major.

I feel like using an old technology. I like SQL. Old technology does not equal obsolete as stored procedures won't go away any time soon.

A: 

I don't see any obvious technical limitations. The stored procedure might be more difficult to follow than an SSIS package for complex ETL operations - but that isn't going to be true for every scenario. I have also found that packages (SSIS and DTS) are more readily recognized as "jobs" - the stored procedures that are executed by scheduled jobs are often overlooked by developers because they can't see the scheduled jobs.

That said, I have seen ETL performed by stored procedures and DTS/SSIS packages alike and as long as the stored procedure isn't a large mess of tangled code it seems appropriate. I haven't seen one method perform better or more reliably than another (but then I haven't seen stored procedures doing complex ETL).

Mayo
+1  A: 

I would say it depends some on what you are doing. However, from my experience the room for improvement with SSIS packages is tremendous. We saw 10 fold improvements in our data warehouse environment when we took some of the heavy hitting stored procedures and put them in SSIS packages. The memory utilization of SSIS (in this situation anyways) made all of the difference.

I want to reiterate that it is important to know what you are doing. For example, a SQL statement will usally outperform a SSIS data-flow when the data transform is table-to-table on the same server.

The best bet it to pick a SP or two and create them in SSIS and test them both.

Seems like the answer for all SQL questions start with, It depends...

Irwin M. Fletcher
I would agree here - if your stored procedure ETL is taking too long (i.e. more than a few minutes?) then you would want to consider SSIS as an alternative for performance reasons. :)
Mayo
A: 

I tried some of features in SSIS and i was not happy with all of them. I stated off with the data flow thingy and i was not really happy with the performance that i saw. What i ended up doing was develop an SSIS packages which had a control flow of sql task each of which executed a stored proc.

This made sure that SQL server did most of the E,T, and the L. I think when you use the dataflow component the data actually moves from sql server to machine running the package which makes it not so efficient.

Having said that, i think i would have tried to optimize the Dataflow thingy( its been a while since i worked on it) if i had to interact with 3rd party applications/ databases / DW systems.

ps
+5  A: 

If your ETL is mostly E and L, with very little T, and if you can write your SPs so they don't rely on cursors, then going the SP-only route is probably fine.

For more complex processes, particularly those that involve heavy transforms, slowly changing dimensions, data mining lookups, etc, SSIS has three advantages.

First, it manages memory very efficiently, which can result in big performance improvements compared to T-SQL alone.

Second, the graphical interface lets you build large, complex and reliable transforms much more easily than hand-crafted T-SQL.

And third, SSIS lets you more easily interact with additional external sources, which can be very handy for things like data cleansing.

RickNZ
I would only use SSIS if you are moving data from one instance to another, or if you want your ETL to easily scale that way. If you are doing ETL on different databases on the same instance I would keep it simple and use T-SQL. I primarily use SSIS as a workflow engine to move data from one place to another and then call T-SQL procedures.
Jason Cumberland
+2  A: 

I've lived in the land of stored procedure ETL for a multi-terabyte SQL Server data warehouse. This decision was made back in 2001 when .NET was 1.0, so VB6 was the programming language alternative, and SSIS wasn't around yet - it was DTS. I can tell you that there were advantages and disadvantages, like anything.

Some considerations:

  1. If everyone on your team understands SQL, it's easy to dig into the stored procs. SQL is a widely known skill which may be a benefit if you have a lot of ETL writers/readers. You have to be more than a casual user of SSIS in order to understand what it's doing. The high level graphical flow is nice for documentation, but if someone needs to get into the guts, they'd better know SSIS well.
  2. SQL is a pain to modularize. If you use UDFs, you are going to incur a huge performance hit. You'll write similar code in multiple places and you'll hate yourself for doing it, but often in ETL scenarios performance is king. SSIS will help you modularize and factor out your tasks.
  3. Don't expect to be able to easily use source control with SSIS. SQL - no problem. SSIS uses awful XML files which can be checked in, but good luck diffing with previous versions to see what changed and when.
  4. You need to think about your SPs in a modular way, even though it's hard to make them as modular as you'd like. Use temp tables to chunk up your processing. Put indexes on those temp tables before you use them. Don't try to do too much at once. Comment everything.
  5. If you're using cursors, you're doing it wrong. Don't be afraid to chain in some external console app you wrote in the language of your choice to do some things SQL just wasn't cut out for.

BTW - after I left that company, they finally upgraded the database from SQL 2000 to 2008 and slowly moved from stored procs to SSIS. At my new company, we own SSIS but after using it we all agreed that our custom written .NET ETL is a better fit for our purposes. Everyone takes their own route. The decision has to balance maintenance and performance and the skill-set of your team and the skill-set of the job pool in your area.

mattmc3
A: 
  1. Performance will be faster than normal sp. Do not need to create complex temp table, Cursor, indexing for retrieve data.

  2. Data cleaning is advantage of SSIS.

  3. Incremental handling is only possible in ssis.

  4. We can create package configuration file and deploy it to any server. User can provide the server details and log in information.

  5. Graphical user interface.

  6. Logging, error handling is best in ssis.

Ashis Das