views:

287

answers:

7

Hello, I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.

Here is an example flow. The object/widget doesn't really matter.

  1. Customer comes to the site and specifies object/widget they are looking for.
  2. We search/clean/filter for widgets matching some initial criteria. <-- long running process
  3. Customer further configures more detail about the widget they are looking for.
  4. When the long running process is complete the customer is able to complete the last few steps before conversion.

Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.

The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.

A few questions:

  1. Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
  2. Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
  3. Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?

Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.

Thanks!

A: 

This is the poor man's solution:

exec ("/usr/bin/php long_running_process.php > /dev/null &");

Alternatively you could:

  1. Insert a row into your database with details of the background request, which a daemon can then read and process.

  2. Write a message to a message queue which a daemon then read and processed.

jonnii
+1  A: 

Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.

Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?

tvanfosson
Hello, we have done several passes over the schema to make sure it is indexed properly. We're somewhat limited in that we can't do much off-line processing (b/c of usage agreements for the data).
drsnyder
A: 

Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Clutch
A: 

Here's some discussion on the Java version of this problem.

See java: what are the best techniques for communicating with a batch server

Two important things you might do:

  1. Switch to Java and use JMS.

  2. Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.

S.Lott
Apache ActiveMQ supports various languages and protocols, including PHP through Stomp: http://stomp.codehaus.org/PHP+Connectivity.
Phillip Whelan
A: 

Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.

Bruno De Fraine
+1  A: 

As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:

  • Make sure that any parameters passed through are escaped correctly
  • Ensure that more than one copy of the process does not run at once

If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.

So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.

Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)

MarkR
A: 

I'm not a web-head, but there's a method that works on all single-thread performance problems if you can exercise it under an IDE: How to Optimize Your Program's Performance. The fact that you say a good part of it is DB-based, makes me suspect even more that this could help.

Mike Dunlavey