views:

175

answers:

1

Consider a PHP web application whose purpose is to accept user requests to start generic asynchronous jobs, and then create a worker process/thread to run the job. The jobs are not particularly CPU or memory intensive, but are expected to block on I/O calls fairly often. No more than one or two jobs should be started per second, but due to the long run times there may be many jobs running at once.

Therefore, it's of utmost importance that the jobs run in parallel. Also, each job must be monitored by a manager daemon responsible for killing hung workers, aborting workers on user request, etc.

What is the best way to go about implementing a system such as this? I can see:

  1. Forking a worker from the manager - this appears to be the lowest-level option, and I would have to implement a monitoring system myself. Apache is the web server, so it appears that this option would require any PHP workers to be started via FastCGI.
  2. Use some sort of job/message queue. (gearman, beanstalkd, RabbitMQ, etc.) - Initially, this seemed like the obvious choice. After some research, I'm somewhat confused with all of the options. For instance, Gearman looks like it's designed for huge distributed systems where there is a fixed pool of workers...so I don't know if it's right for what I need (one worker per job).
+2  A: 

Well, if you're on Linux, you can use pcntl_fork to fork children off. The "master" then watches the children. Each child completes its task and then exists normally.

Personally, in my implementations I've never needed a message queue. I simply used an array in the "master" with locks. When a child got a job, it would write a lock file with the job id number. The master would then wait until that child exited. If the lock file still exists after the child exited, then I know the task wasn't completed, and re-launch a child with the same job (after removing the lock file). Depending on your situation, you could implement the queue in a simple database table. Insert jobs in the table, and check the table in the master every 30 or 60 seconds for new jobs. Then only delete them from the table once the child is finished (and the child removed the lock file). This would have issues if you had more than one "master" running at a time, but you could implement a global "master pid file" to detect and prevent multiple instances...

And I would not suggest forking with FastCGI. It can result in some very obscure problems since the environment is meant to persist. Instead, use CGI if you must have it web interface, but ideally use a CLI app (a deamon). To interface with the master from other processes, you can either use sockets for TCP communication, or create a FIFO file for communication.

As for detecting hung workers, you could implement a "heart-beat" system, where the child issues a SIG_USR1 to the master process every so many seconds. Then if you haven't heard from the child in two or three times that time, it may be hung. But the thing is since PHP isn't multi-threaded, you can't tell if a child is hung or if it's just waiting on a blocking resource (like a database call)... As for implementing the "heart-beat", you could use a tick function to automate the heart-beat (but keep in mind, blocking calls still won't execute)...

ircmaxell
@ircmaxell Great info!
Codex73
Thanks. I've done this a few times now, and it works REALLY well. Well, I should say it works really well if your use cases are aligned with the limitations of the system (IPC is fairly expensive, etc). If they are not very well aligned, you should uses a true threading implementation and a language other than PHP...
ircmaxell
Careful with `pcntl_fork()`, though. I've had issues with database connections being shared in weird ways between the parent and child processes. I wouldn't be surprised if some PECL extensions share similar quirks. I'd shy away from forking in PHP, and spawn separate processes via `exec()` and the like, just to keep things simple
Frank Farmer
ircmaxell
@ircmaxell Interesting comment about FastCGI. Didn't know that, but it makes sense. With this particular app, there is no reason why the master/manager can't be a CLI daemon, so I may try that approach.
Joshua Johnson