views:

23

answers:

1

I need a cron job to check email accounts and download certain attachments. This cron job will be running every minute. My problem is that when the first script is downloading email attachments, a second (or third or fourth ...) script will most likley be running alongside the previously executed script. The problem is that I am not sure what is the best way to prevent subsequent scripts from trying to download the email attachments that the previous scripts have downloaded (or is currently downloading).

Other details:

  • Using CodeIgniter
  • Using Zend Mail library to access emails and download attachments
  • Email is probably going to be POP3

I'm wondering if I can use "message-id" header somehow.

+1  A: 

You can just prevent concurrent cron scripts from running by using a lock file. You probably don't want concurrent scripts downloading the attachments, you will have nasty race conditions.

Works like this:

starter.sh

  • Checks if the lock file exists, if it does, exit immediately.
  • If it does not, it creates a lock file and starts the main.php script.
  • After main.php completes, it deletes the lock file.
  • This way if main.php crashes or throws an exception, the lock file is deleted and it is able to restart the next minute.

main.php

  • does the actual downloading.

Edit

I see. You then need a more complex setup. single threaded script that reads the from, subject and datetime, and sticks them in a queue (a db table is perfect for this). It should work as I explained above. Secondly you have a pool of php processes that read the db table and downloads attachments based of the data in the table.

Since they are working with a thread safe data structure (the database) they can set a flag on each row that says downloading/downloaded.

They just working a loop with a sql select at the top (select top 1 not downloaded/downloading messages). You will need something to handle crashes and restart php processes, but that is the jist of it.

Byron Whitlock
Multiple scripts running at the same time is actually ideal. I'm trying to simulate multi-threading. I'm dealing with thousands of email attachments per hour.
StackOverflowNewbie