tags:

views:

319

answers:

8

I would interested to hear from other developer tip on SSIS best practices, dos and don'ts, pitfalls and tricks, etc.

I'm trying to introduce a few here at my company.

+3  A: 

This blog by Jamie Thompson is really good that has both best practices and naming conventions: BestPracticesandNamingConventions

rfonn
+2  A: 

There is a 4000-character limit to the DT_WSTR (Unicode string) data type used by SSIS. Since expressions are converted to DT_WSTR, they also cannot exceed 4000 characters.

You'll get an error if you exceed the limit at design time. However, you will not receive any obvious warnings or errors if you do so at run time. The operation will fail silently, and the expression may be evaluated incorrectly.

Note: this does not mean that strings or SSIS variables cannot exceed 4000 characters. The limitation only applies to expressions and other places that DT_WSTR is used. For a good explanation of this, take a look at:

http://blogs.conchango.com/jamiethomson/archive/2009/05/27/4000-character-limit-in-ssis.aspx

Here's an example that reproduces the issue:

  1. Create an SSIS string variable named testVariable.
  2. Assign the variable a long value in a script task (5000 characters, for instance). The assignment should succeed.
  3. Create a second task and connect the two tasks using a precedence constraint. Set the evaluation operation to "Expression" and set the expression to @testVariable != "".

Even though the variable is not empty, the constraint will incorrectly evaluate to False because the expression exceeds 4000 characters. However, this will not cause any errors and the package will not fail.

If you notice strange behavior in your SSIS logic, take a look at your expressions and ensure that they don't exceed 4000 characters when evaluated.

Andy West
+2  A: 

This is another great blog about config files for porting SSIS packages - a huge issue we ran into when we first started using SSIS in the tiered environment.

http://jessicammoss.blogspot.com/2008/05/ssis-configuration-to-configuration-to.html

ajdams
+1  A: 

Put your SSIS packages in source control.

We have found that using a datasource of select field1, field2 from mytable is faster than using the option to select all records from the table. Made a huge difference in one of the package we did where we preprocessed data into a staging table for an export.

Build a standard import package and a standard export package that already has the variables you commonly need and the connections you commonly need and error handling you commonly use. It is much faster to build a package by opening a template package and saving under the new name, then adjusting for particular circumstances than it is to build everything from scratch each time.

A couple of things not specific to SSIS, if you send an export to another company, make sure to copy the file to an archive location as well as the ftp site. This will help tremendously when there is a question about what was sent when. Never import data from another source without looking at whether the data needs to be cleansed. An address missing the city and state is useless, an email that says "Talk to his secretary" is also useless, don't add this stuff to your database. Don't try to add records that will fail insert. If first name is a required field in your database and the import file doesn't havea value for one of the records, either add a default value ("Unknown") or don't try to insert the record.

Log errors and steps processed. This will help tremendously in figuring out where something went wrong when that package that has run flawlessly for six months suddenly fails.

Use variables and configurations to change things at run time. This lets you have test locations and prod locations, so you aren't putting the test file you don't yet want the client to see on his ftp site.

Use the trash destination (you can download it here - http://www.sqlis.com/post/Trash-Destination-Adapter.aspx) as your destination until you have tested enough to know that what you are sending to the destination is what you intended to send. In conjunction with this, use the Data Viewer to see what you data looks like at various parts of the data flow.

HLGEM
+1  A: 

For pretty large packages, or whenever possible, use non-container container to help a future reader of the packages understand the package easily. A non-container is just like any other container except that it has no programming logic behind it. It just helps to group related items together for easier understanding of the package. The concept behind it is that a human mind cannot take in more than 5-6 items together.

+1  A: 

Additionally, use the properties Version, Version Description as much as you can. Although there is a package history (Right click and choose View History) this adds an additional layer of documentation (annotations are good but are not that practical due to time constraints). Version / Version Description helps since the version # in the property can be coordinated well with ETL Design Documentation Revision History log.

Hmmm, I'm not quite seeing what you are referring to, can you advise further on where to see this? thanks.
James Wiseman
+1  A: 

My own answer to this, which I'll add to over time.

  1. Use BIDS Helper (http://www.codeplex.com/bidshelper)
  2. Remember that package configration settings read from the registry are taken from HKCU. This has an impact if you run the job from the SQL Agent, as you have to remember that these settings are applied against this user.
  3. For Bulk Data transfer of large volumes, ensure you select 'Table Or View - Fast Load' on the destination.
  4. In general for large volumes, read up further on settings on the destination to help you tune the performance.
  5. Package Storage - MSDB vs File System - Unless you intend to take advantage of the SSIS security options, its almost always better to use the File System (see http://pragmaticworks.com/community/blogs/nayanpatel/archive/2008/11/13/what-are-the-advantages-disadvantages-of-storing-ssis-packages-to-msdb-vs-file-system.aspx)
James Wiseman
+1  A: 

You can also look at the BI Reference Implementation Microsoft did back in 2005. It's called Project REAL. It used real data from Barnes & Noble to simulate the full life-cycle of a data warehouse. There's a guide on ETL best practices there.

Eric Ness

related questions