I am trying to find some development best practises for SQL Server Reporting Services, Analysis Services and Integration Services.
Does anyone have some useful links or guidance they can offer on this subject?
I am trying to find some development best practises for SQL Server Reporting Services, Analysis Services and Integration Services.
Does anyone have some useful links or guidance they can offer on this subject?
I have read some SSIS stuff here
http://stackoverflow.com/questions/142015/ssis-gurus-links-to-great-learning-content-requested
http://stackoverflow.com/questions/440346/how-to-approach-an-etl-mission
I can only talk specifically to SSIS although some of this wil be applicable to the others as well.
Save your packages as files and put them in Source Control.
Where possible use variables for things that will change from server to server or run to run.
Use configuration files to save the configuration for differnt environments.
When processing data that comes from an outside source, assume it will change format without warning (ie check to see that the data you expect in each column is the data you got!) Nothing like putting the emails in the lastname field (or as happened to us once in DTS, the social security number into the field that said how much to pay the person, sure glad we caught that before someone got paid that amount.).
Things I have seen happen include adding new columns, removing columns that are critical to your process, reaarranging the order of the columns (especially bad when the file itself does not have the column names), leaving the column titles the same but changing the data they contain (yes once I got a file where the last name data was in the column labelled First_name and vice versa), data with new values that don't have a match to values in your system (i'm think of look up type things here like medical specialties), flat out strange data such as notes in an email field, names in this format lastname - 'Willams, Jo' first_name - 'hn' (combine the two fields to get the whole name - apparently their data entry people just typed the name until they ran out of spaces and continued on in the next field no matter where they were in the name!).
Don't put uncleaned data into your database.
Always retain a copy of any files that you process or send out. Amazing how often you will need to research back.
Log errors and log records that needed cleaning, espcially if the problem in the field was such that it caused the process to fail. It is a whole lot easier to see the errors in a table than to know your 20 million record file failed because one record had an extra | in it and try to figure out which one it was.
If you do a lot of similar imports in SSIS, create a template project that has all the standard logging and data cleaning it it. It is a whole lot faster to start from a template and adjust to new mappings based onteh new file you are working with and make minor adjustoments to things specific to that file than to rewrite every SSIS package from scratch.
Store meta data. Sooner or later you will be asked, how often did it fail or how soon after the file was received did the import happen or even when was the last import. All our pacakges start and end with a task to store start and stop times in our meta data table. All failure paths include a task to mark the import as failed in our meta data. Eventually you can build a system that knows how many records to expect and fail it if the new file is significantly off. Meta data can also be used to store things like number of records which can help identify when they sent a partial file instead of the whole file you were expecting and prevent you from blowing away 300,000 sales targets they actually still want.