views:

2221

answers:

7

I work on a somewhat large web application, and the backend is mostly in PHP. There are several places in the code where I need to complete some task, but I don't want to make the user wait for the result. For example, when creating a new account, I need to send them a welcome email. But when they hit the 'Finish Registration' button, I don't want to make them wait until the email is actually sent, I just want to start the process, and return a message to the user right away.

Up until now, in some places I've been using what feels like a hack with exec(). Basically doing things like:

exec("doTask.php $arg1 $arg2 $arg3 >/dev/null 2>&1 &");

Which appears to work, but I'm wondering if there's a better way. I'm considering writing a system which queues up tasks in a MySQL table, and a separate long-running PHP script that queries that table once a second, and executes any new tasks it finds. This would also have the advantage of letting me split the tasks among several worker machines in the future if I needed to.

Am I re-inventing the wheel? Is there a better solution than the exec() hack or the MySQL queue?

+5  A: 

I've used the queuing approach, and it works well as you can defer that processing until your server load is idle, letting you manage your load quite effectively if you can partition off "tasks which aren't urgent" easily.

Rolling your own isn't too tricky, here's a few other options to check out:

  • ActiveMQ if you want a full blown open source message queue.
  • beanstalkd - only found this one while writing this answer, but looks interesting
  • dropr is a PHP based message queue project, but I'm not sure how mature that is at the moment.
  • Finally, a blog post about using memcached for message queuing

Another, perhaps simpler, approach is to use ignore_user_abort - once you've sent the page to the user, you can do your final processing without fear of premature termination, though this does have the effect of appearing to prolong the page load from the user perspective.

Paul Dixon
Thanks for all the tips. The specific one about ignore_user_abort doesn't really help in my case, my whole goal is to avoid unnecessary delays for the user.
davr
A: 

Unfortunately PHP does not have any kind of native threading capabilities. So I think in this case you have no choice but to use some kind of custom code to do what you want to do.

If you search around the net for PHP threading stuff, some people have come up with ways to simulate threads on PHP.

Peter D
A: 

PHP is a single-threaded language, so there is no official way to start an asynchronous process with it other than using exec or popen. There is a blog post about that here. Your idea for a queue in MySQL is a good idea as well.

Your specific requirement here is for sending an email to the user. I'm curious as to why you are trying to do that asynchronously since sending an email is a pretty trivial and quick task to perform. I suppose if you are sending tons of email and your ISP is blocking you on suspicion of spamming, that might be one reason to queue, but other than that I can't think of any reason to do it this way.

Marc W
The email was just an example, since the other tasks are more complex to explain, and it's not really the point of the question. The way we used to send email, the email command wouldn't return until the remote server accepted the mail. We found that some mail servers were configured to add long delays (like 10-20 second delays) before accepting mail (probably to fight spambots), and these delays would then be passed onto our users. Now, we are using a local mailserver to queue up the mails to be sent, so this particular one doesn't apply, but we have other tasks of similar nature.
davr
+3  A: 

This is the same method I have been using for a couple of years now and I haven't seen or found anything better. As people have said, PHP is single threaded, so there isn't much else you can do.

I have actually added one extra level to this and that's getting and storing the process id. This allows me to redirect to another page and have the user sit on that page, using AJAX to check if the process is complete (process id no longer exists). This is useful for cases where the length of the script would cause the browser to timeout, but the user needs to wait for that script to complete before the next step. (In my case it was processing large ZIP files with CSV like files that add up to 30 000 records to the database after which the user needs to confirm some information.)

I have also used a similar process for report generation. I'm not sure I'd use "background processing" for something such as an email, unless there is a real problem with a slow SMTP. Instead I might use a table as a queue and then have a process that runs every minute to send the emails within the queue. You would need to be warry of sending emails twice or other similar problems. I would consider a similar queueing process for other tasks as well.

Darryl Hein
+1  A: 

Here is a simple class I coded for my web application. It allows for forking PHP scripts and other scripts. Works on UNIX and Windows.

class BackgroundProcess {
    static function open($exec, $cwd = null) {
     if (!is_string($cwd)) {
      $cwd = @getcwd();
     }

     @chdir($cwd);

     if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
      $WshShell = new COM("WScript.Shell");
      $WshShell->CurrentDirectory = str_replace('/', '\\', $cwd);
      $WshShell->Run($exec, 0, false);
     } else {
      exec($exec . " > /dev/null 2>&1 &");
     }
    }

    static function fork($phpScript, $phpExec = null) {
     $cwd = dirname($phpScript);

     @putenv("PHP_FORCECLI=true");

     if (!is_string($phpExec) || !file_exists($phpExec)) {
      if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
       $phpExec = str_replace('/', '\\', dirname(ini_get('extension_dir'))) . '\php.exe';

       if (@file_exists($phpExec)) {
        BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
       }
      } else {
       $phpExec = exec("which php-cli");

       if ($phpExec[0] != '/') {
        $phpExec = exec("which php");
       }

       if ($phpExec[0] == '/') {
        BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
       }
      }
     } else {
      if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
       $phpExec = str_replace('/', '\\', $phpExec);
      }

      BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
     }
    }
}
Andrew Moore
+3  A: 

Another way to fork processes is via curl. You can set up your internal tasks as a webservice. For example:

Then in your user accessed scripts make calls to the service:

$service->addTask('t1', $data); // post data to URL via curl

Your service can keep track of the queue of tasks with mysql or whatever you like the point is: it's all wrapped up within the service and your script is just consuming URLs. This frees you up to move the service to another machine/server if necessary (ie easily scalable).

Adding http authorization or a custom authorization scheme (like Amazon's web services) lets you open up your tasks to be consumed by other people/services (if you want) and you could take it further and add a monitoring service on top to keep track of queue and task status.

It does take a bit of set-up work but there are a lot of benefits.

rojoca
+2  A: 

I've used Beanstalkd for one project, and planned to again. I've found it to be an excellent way to run asynchronous processes.

A couple of things I've done with it are:

  • Image resizing - and with a lightly loaded queue passing off to a CLI-based PHP script, resizing large (2mb+) images worked just fine, but trying to resize the same images within a mod_php instance was regularly running into memory-space issues (I limited the PHP process to 32MB, and the resizing took more than that)
  • near-future checks - beanstalkd has delays available to it (make this job available to run only after X seconds) - so I can fire off 5 or 10 checks for an event, a little later in time

I wrote a Zend-Framework based system to decode a 'nice' url, so for example, to resize an image it would call QueueTask('/image/resize/filename/example.jpg'). The URL was first decoded to an array(module,controller,action,parameters), and then converted to JSON for injection to the queue itself.

A long running cli script then picked up the job from the queue, ran it (via Zend_Router_Simple), and if required, put information into memcached for the website PHP to pick up as required when it was done.

One wrinkle I did also put in was that the cli-script only ran for 50 loops before restarting, but if it did want to restart as planned, it would do so immediately (being run via a bash-script). If there was a problem and I did exit(0) (the default value for exit; or die();) it would first pause for a couple of seconds.

Alister Bulman
I like the look of beanstalkd, once they add persistence I think it will be perfect.
davr
Thats already in the codebase and being stabilised. I'm also looking forward to 'named jobs', so I can throw things in there, but know it won't be added if there's already one there. Good for regular events.
Alister Bulman