views:

324

answers:

5

Background:

I've a folder that gets pumped with files continuously. My SSIS package needs to process the files and delete them. The SSIS package is scheduled to run once every minute. I'm picking up the files in ascending order of file creation time. I'm building an array of files and then processing-deleting them one at a time.

Problem:

If an instance of my package takes longer than one minute to run, the next instance of the SSIS package will pick up some of the files the previous instance has in its buffer. By the time the second instance of teh package gets around to processing a file, it may already have been deleted by the first instance, creating an exception condition.

I was wondering whether there was a way to avoid the exception condition.

Thanks.

+1  A: 

Can you check for the existance of the file before you delete it.

File.Exists(filepathandname)

Gratzy
Yes, for the delete part. But, the processing part would be duplicated.
RT
Can you move the files when they are picked up, then check for existence before the next set of processing is due to start?
revelator
@revelator,Yes, that is possible. However, that entails creating a temporary directory, moving the files to that directory, processing them, and deleting them. If the process fails for whatever reason, you have files "stuck" in the temp directory.I was wondering whether there is a "cleaner" solution.
RT
+1  A: 

How are you scheduling the job? If you are using the SQL Server Job Scheduler I'm under the impression is should not re-run a job already running; see this SO question: Will a SQL Server Job skip a scheduled run if it is already running?

Alternatively, rather than trying to move the file around you could build a step of your job to test if it is already running. I've not done this myself but it would appear to be possible, have a read of this article Detecting The State of a SQL Server Agent Job

Tony
Thanks for the info. Since Agent will not run a second instance until the first instance is complete, I can safely schedule my package at one minute intervals.
RT
A: 

To make sure your package are not messing up with the same files, you could just create an empty file called just like the filename but with another extension (like mydata.csv.being_processes) and make sure your Data Flow Task is running only on files that don't have such file.

This acts as a lock.

Of course you could change the way you're scheduling your jobs but often - when we encounter such an issue - it's because we got no leverage on those things :)

Cesario
A: 

You can create a "lock file" to prevent parallel execution of packages. To protect yourself from case with crashed package consider using file's create date to emulate lock timeout.

I.e.: at the beginning of package you will check for existence of lock file. If it's non-existent OR it was created more than X hours ago - then continue with import. Otherwise exit.

A: 

I have a similar situation. What you do is have your SSIS package read all files within a folder and create a work file like 'process.txt'. This will create a list of valid files at that point in time. If you have multiple packages, then create the file with a name like 'process_.txt'. The package will only process the files named in their process file. This will prevent overlap.

Josef Richberg

related questions