I have a challenge that I am trying to solve and I can't work out from the documentation or the examples if SSIS is suitable for my problem.
I have 2 tables (jobs and tasks). Jobs represent a large piece of work, while tasks are tied to jobs. There will typically be anything from 1 task per job to 1,000,000 tasks per job. Each task has a column storing the job_id. The job_id in the jobs table is the primary key.
Every N hours, I want to do the following:
Take all of the job rows where the jobs have completed since I last ran (based on having an end_time value and that value being within the time between now and when I last ran) and add these to the jobs column in the 'query' database.
Copy all of the tasks that have a job_id from the jobs that were included in step 1 into the tasks column in the 'query' database.
Basically, I want to be able to regularly update my query database, but I only want to include completed jobs (hence the requirement of an end_time) and tasks from those completed jobs.
This is likely to be done 2 - 3 times per day so that users are able to query an almost-up-to-date copy of the live data.
Is SSIS suitable for this task, and if so, can you please advise some documentation to show where a column from the results from 1 step are used as the criteria for a 2nd step ?
Thanks in advance...