tags:

views:

398

answers:

3

Whenever a new user signs up on my site, I want to do some pre-processing to shorten their searches in the future. This involves anywhere from 30 to 2 minutes processing time. Obviously I cannot do this when they click the submit button on signup... or on any PHP page they visit. However, I would like this done within 5 minutes of them signing up (or less).

Cron Route I THINK this needs to be in a cron job, and if so, how should I setup the cron job? If so, what should my cron line look like to run every 2 minutes, and how can I insure that I don't have the same cron job overlapping the next?

Event/Fork Route - Preferred If I can possibly throw some event to my server without disrupting my users experience or fork a process off of the users signup (instead of a cron job) how could I do this?

+5  A: 

I would recommend neither solution.

Instead, you would be best off with a long running process (daemon) that gets its jobs from a message queue. The message queue itself could be off a database if that is your preferred method.

You will post an identifier for the job to your database, and then a long running process will iterate through them once in a while and act upon them.

This is as simple as:

<?php
while(true) {
   jobs = getListOfJobsFromDatabase();  // get the jobs from the databbase
   foreach (jobs as job) {
      processAJob(job); // do whatever needs to be done for the job
      deleteJobFromDatabase(job); //remember to delete the job once its done!
   }
   sleep(60); // sleep for a while so it doesnt thrash your database when theres nothing to do
}
?>

And just run that script from the command line.

The benefits of this over a cron job are that you wont get a race condition.

You may also want to fork off the actually processing of the jobs so many can be done in parallel, rather than processing sequentially.

Matt
how could I insure that this is running, and if it crashes, recover and restart it?
Mike Curry
You could wrap it in a loop within a shell script. That way, when it terminates, via a crash or otherwise, the shell script will relaunch it.
Matt
Shoudl I spawn this by an hourly cron (run only if not running?)
Mike Curry
there's no need to schedule it via cron. I would simply launch it either as a startup serviceat boot or at the command line via "screen".
Matt
This looks like overkill to me
drikoda
You could do the same basic thing with a cron job that ran every few minutes and pops the newest job off the queue. Also, I don't understand you suggestion that cron will inherently lead to a race condition.
acrosman
I was going to ask the same thing as acrosman, could you clarify what race condition we prevent with your version vs a cron job?
dimo414
If you run it on a cronjob, and pop the task off the queue before it has run, then you risk of the task not being completed in the event of a problem. Likewise, by NOT popping the task off the queue, you risk the chance of another cron job starting before the task is complete (and then removed from the queue), and attempting to perform the same task - hence race condition
Matt
A: 

Here's what i think. Have a single cron for all your users. This way you can assure no overlapping by putting them in a table that works like a queue. Run the cron every hour but check the queue first if the queue is not empty. If it's not skip the cron job for the hour try again the next.

drikoda
As mentioned by the OP, the process could take anything up to 30 minutes to complete. As such, if he had 3 members sign up to the site, you would still encounter a race condition if running hourly. Additionally, he would like the process to run within 5 minutes of a signup - whereas your suggestion would only run hourly, meaning the process is scheduled up to 55mins late.
Matt
+1  A: 

You can use the following class to invoke a background PHP task.

class BackgroundProcess {
    static function open($exec, $cwd = null) {
     if (!is_string($cwd)) {
      $cwd = @getcwd();
     }

     @chdir($cwd);

     if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
      $WshShell = new COM("WScript.Shell");
      $WshShell->CurrentDirectory = str_replace('/', '\\', $cwd);
      $WshShell->Run($exec, 0, false);
     } else {
      exec($exec . " > /dev/null 2>&1 &");
     }
    }

    static function fork($phpScript, $phpExec = null) {
     $cwd = dirname($phpScript);

     if (!is_string($phpExec) || !file_exists($phpExec)) {
      if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
       $phpExec = str_replace('/', '\\', dirname(ini_get('extension_dir'))) . '\php.exe';

       if (@file_exists($phpExec)) {
        BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
       }
      } else {
       $phpExec = exec("which php-cli");

       if ($phpExec[0] != '/') {
        $phpExec = exec("which php");
       }

       if ($phpExec[0] == '/') {
        BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
       }
      }
     } else {
      if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') {
       $phpExec = str_replace('/', '\\', $phpExec);
      }

      BackgroundProcess::open(escapeshellarg($phpExec) . " " . escapeshellarg($phpScript), $cwd);
     }
    }
}

Use as such:

BackgroundProcess::fork('process_user.php');
Andrew Moore
I don't understand the downvote. It works, it's elegant, and most of all, doesn't need manual input to run compared to the other solution proposed above.
Andrew Moore
Its certainly a nice solution, and i'll upvote you for it. My main concern with this particular setup is lack of control over the volume of spawned processes. If you have no central way to manage the pool of workers, you could end up with a massive number of jobs running in parallel, which would severely impact performance and possibly result in exhaustion of system resources.
Matt
That's up to the poster to implement that. The spawned process could easily check if another process is running in the background and let that process process his user too.
Andrew Moore