tags:

views:

75

answers:

1

I have a challenge that I am trying to solve and I can't work out from the documentation or the examples if SSIS is suitable for my problem.

I have 2 tables (jobs and tasks). Jobs represent a large piece of work, while tasks are tied to jobs. There will typically be anything from 1 task per job to 1,000,000 tasks per job. Each task has a column storing the job_id. The job_id in the jobs table is the primary key.

Every N hours, I want to do the following:

  1. Take all of the job rows where the jobs have completed since I last ran (based on having an end_time value and that value being within the time between now and when I last ran) and add these to the jobs column in the 'query' database.

  2. Copy all of the tasks that have a job_id from the jobs that were included in step 1 into the tasks column in the 'query' database.

Basically, I want to be able to regularly update my query database, but I only want to include completed jobs (hence the requirement of an end_time) and tasks from those completed jobs.

This is likely to be done 2 - 3 times per day so that users are able to query an almost-up-to-date copy of the live data.

Is SSIS suitable for this task, and if so, can you please advise some documentation to show where a column from the results from 1 step are used as the criteria for a 2nd step ?

Thanks in advance...

A: 

Sure SSIS can do that.

If you want to be sure that the child record are moved, then use a query for your data flow source for te second data flow. You insert the records to the main table in the first data flow. Then you use a query that picks any records in the source child table that are not inthe destination child table and that have records in the parent destination table. This way you catch any changes to existing closed records as well (you know there will be some, some one will close a job too soon and reopen and add something to it.)

Alternatively, you can add the records you are moving to a staging table. Then join to that table when doing the dataflow for the child tables. This will ensure that exactly the records you moved are the ones the child tables are populated for.

Or if you are in a denormalized datawarehouse just write a query that joins the parent and child tables together with the where clause for end date is not null. Of course don't forget to check for records that aren't currently in the datawarehouse.

HLGEM
Brilliant - thanks... I'll persist in the docs and try and understand how to do that. Fortunately there is no way for a job to be reopened, so there is not much risk there. It's an atomic submission that includes the job and the tasks. One final indulgence please... Do you have any pointers to a doc that explains how SSIS understands what data is on the target and what data needs to be run? In my scripted solution, I'm getting a datestamp (including time), pulling the last run one out of the DB then getting the rows in that window. At the end, I then put the new datestamp into the DB.
Anonymouslemming