views:

372

answers:

4

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.

What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?

I have thought of:

  • Launching multiple connections from a client, having each start the appropriate SP.
  • Setting up jobs for each SP and starting the jobs from a SQL Server connection or SP.
  • Using xp_cmdshell to start additional runs equivalent to osql or whetever
  • SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production

In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...

SSIS could be a good option - if I can table-drive the SP list.

This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.

I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.

+2  A: 

Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)

Codewerks
SSIS could be a good option - if I can table-drive the SP list because I'm not sure how much access my clients are going to get to production. This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8 way 32GB machine
Cade Roux
+1  A: 

At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.

The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.

Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:

You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.

Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:

  • The load process probably has plenty of other tasks that can execute in the meantime
  • The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.

You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M. This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.

ConcernedOfTunbridgeWells
Updating the question.
Cade Roux
You will also probably get a maintenance win from a subject-oriented ETL architecture if you have potentially more than one developer and multiple source systems.
ConcernedOfTunbridgeWells
+1  A: 

you might want to look at the service broker and it's activation stored procedures... might be an option...

Mladen
I'll definitely add that as an option.
Cade Roux
A: 

In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.

Cade Roux