tags:

views:

109

answers:

1

Our shop relies heavily on SSIS to run our back end processes and database tasks. Overall, we have hundreds of jobs, and for the most part, everything runs efficiently and smoothly.

Most of the time, we have a job failure due to an external dependency failing (data not available, files not delivered, etc). Right now, our process is set up to email us every time a job fails. SSIS will generate an email sending us the name of the job and the step it failed on.

I'm looking at creating a dashboard of sorts to monitor this process more efficiently. I know that the same information available in the Job History window from SSIS is also available by querying the msdb database. I want to set up a central location to report failures (probably using SQL Reporting Services), and also a more intelligent email alert system.

Has anyone else dealt with this issue? If so, what kind of processes/reporting did you create around the SSIS procedures to streamline notification of job failures or alerts?

+1  A: 

We have a similar setup at our company. We primarily rely on letting the jobs notify us when there is a problem and we have employees who check job statuses at specific times to ensure that everything is working properly and nothing was overlooked.

My team receives a SQL Server Agent Job Activity Report HTML email every morning at 6am and 4pm that lists all failed jobs at the top, running jobs below that, and all other jobs below that grouped into daily, weekly, monthly, quarterly, and other categories. We essentially monitor SQL Server Agent jobs, not the SSIS packages themselves. We rely on job categories and job schedule naming conventions to automate grouping in the report.

We have a similar setup for monitoring our SSRS subscriptions. However, we only monitor this one once a day since most of our subscriptions are triggered around 3am-4am in the morning. The SSRS Subscription Activity Report goes one step further than the SQL Server Agent Job Activity Report in that it has links to the subscription screen for the report and has more exception handling built into it.

Aside from relying on reports, we also have a few jobs that are set to notify the operator via email upon job completion instead of upon job failure. This makes it easy to quickly check if all the major ETL processes have run successfully. It's sort of an early indicator of the health of the system. If we haven't received this email by the time the first team member gets into the office, then we know something is wrong. We also have a series of jobs that will fail with a job error if certain data sources haven't been loaded by a specific time. Before I had someone working an early shift, I use to check my iPhone for the email anytime I woke up in the middle of the night (which happened a lot back then since I had a newborn baby). On the rare occasion that I didn't receive an email indicating everything completed or I received an error regarding a job step, then I logged onto my machine via remote desktop to check the status of the jobs.

I considered having our data center guys check on the status of the servers by running a report each morning around 4am, but in the end I determined this wouldn't be necessary since we have a person who starts work at 6am. The main concern I had about implementing this process is that our ETL changes over time and it would have been necessary for me to maintain documentation on how to check the jobs properly and how to escalate the notifications to my team when a problem was detected. I would be willing to do this if the processes had to run in the middle of the night. However, our ETL runs every hour of the day, so if we had to kick off all our major ETL processes in the early morning, we would still complete loading our data warehouse and publishing reports before anyone made it into the office. Also, our office starts REALLY late for some reason, so people don't normally run our reports interactively until 9am onward.

Registered User

related questions