views:

194

answers:

4

I'm trying to figure out the most efficient way to running a pretty hefty PHP task thousands of times a day. It needs to make an IMAP connection to Gmail, loop over the emails, save this info to the database and save images locally.

Running this task every so often using a cron isn't that big of a deal, but I need to run it every minute and I know eventually the crons will start running on top of each other and cause memory issues.

What is the next step up when you need to efficiently run a task multiple times a minute? I've been reading about beanstalk & pheanstalk and I'm not entirely sure if that will do what I need. Thoughts???

+9  A: 

I'm not a PHP guy but ... what prevents you from running your script as a daemon? I've written many a perl script that does just that.

Brian Roach
I've never written a daemon before, but I will start doing some more research now. Thanks for the suggestion.
mike
Basically ... you just wrap everything in a `while(1)` and run the script in the background. If it's important that it finish doing something rather than just being killed, look into signal handling so you can clean up before exiting. Bonus points for forking rather than requiring that it be run from the shell in the backgorund :)
Brian Roach
I would suggest 2 files: the first one creates another process which runs the daemon.The first will just wait a couple of seconds and check if the daemon is still running. If not, it can re-launch it. I don't really trust PHP for running such a long time, so I think it's better to take precautions.
Savageman
PHP scripts have no problem with long run times, we have scripts here that run for weeks without problems. You don't have to like PHP (i don't) but the language has matured a lot and is now quite stable.
dbemerlin
@Brian Roach: I believe PHP has traditionally had more memory leak issues than Perl. That, and of course valuing your sanity :)
Duncan
@mike, even if you do it in php you can take a look at perl for basic concepts http://search.cpan.org/~ehood/Proc-Daemon-0.03/Daemon.pm (double forking and other system stuff to make it more robust)
Unreason
That's just a safety for robustness. I don't know whether it's recommended or not with Perl, but I'll do the same. ;)
Savageman
+7  A: 

Either create a locking mechanism so the scripts won't overlap. This is quite simple as scripts only run every minute, a simple .lock file would suffice:

<?php
  if (file_exists("foo.lock")) exit(0);
  file_put_contents("foo.lock", getmypid());

  do_stuff_here();

  unlink("foo.lock");
?>

This will make sure scripts don't run in parallel, you just have to make sure the .lock file is deleted when the program exits, so you should have a single point of exit (except for the exit at the beginning).

A good alternative - as Brian Roach suggested - is a dedicated server process that runs all the time and keeps the connection to the IMAP server up. This reduces overhead a lot and is not much harder than writing a normal php script:

<?php
  connect();
  while (is_world_not_invaded_by_aliens())
  {
    get_mails();
    get_images();
    sleep(time_to_next_check());
  }
  disconnect();
?>
dbemerlin
I think the daemon is going to be my best bet and keeping the IMAP open should make things a lot quicker. Thanks for the advice!
mike
+3  A: 

I've got a number of scripts like these, where I don't want to run them from cron in case they stack-up.

#!/bin/sh
php -f fetchFromImap.php
sleep 60
exec $0

The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.

A simple line will start it, and put it into the background:

cd /x/y/z ; nohup ./loopToFetchMail.sh &

or it can be similarly started when the machine starts with various means (such as Cron's '@reboot ....')

Alister Bulman
A: 

fcron http://fcron.free.fr/ will not start new job if old one is still running, Your could use @ 1 command and not worry about race conditions.

frx