views:

303

answers:

2

Hi, I maintain a custom built CMS-like application.

Whenever a document is submitted, several tasks are performed that can be roughly grouped into the following categories:

  1. MySQL queries.
  2. HTML content parsing.
  3. Search index updating.

Category 1 includes updates to various MySQL tables relating to a document's content.

Category 2 includes parsing of HTML content stored in MySQL LONGTEXT fields to perform some automatic anchor tag transformations. I suspect that a great deal of computation time is spent in this task.

Category 3 includes updates to a simple MySQL-based search index using just a handful of fields corresponding to the document.

All of these tasks need to complete for the document submission to be considered complete.

The machine that hosts this application has dual quad-core Xeon processors (a total of 8 cores). However, whenever a document submits, all PHP code that executes is constrained to a single process running on one of the cores.

My question:

What schemes, if any, have you used to split up your PHP/MySQL web application processing load among multiple CPU cores? My ideal solution would basically spawn a few processes, let them execute in parallel on several cores, and then block until all of the processes are done.

Related question:

What is your favorite PHP performance profiling tool?

+1  A: 

This might not be an answer to the question you are looking for, but the solution you seek deals with threading. Threading is necessary for multicore-programming, and threading is not implemented in PHP.

But, in a sense, you could fake threading in PHP by relying on the operating system's multitasking abilities. I suggest given a quick overview of Multi-threading strategies in PHP to develop a strategy to achieve what you need.

Anthony Forloney
+4  A: 

PHP is not quite oriented towards multi-threading : as you already noticed, each page is served by one PHP process -- that does one thing at a time, including just "waiting" while an SQL query is executed on the database server.

There is not much you can do about that, unfortunately : it's the way PHP works.


Still, here's a couple of thoughs :

  • First of all, you'll probably have more that 1 user at a time on your server, which means you'll serve several pages at the time time, which, in turn, means you'll have several PHP processes and SQL queries running at the same time... which means several cores of your server will be used.
    • Each PHP process will run on one core, in response to the request of one user, but there are several sub-processes of Apache running in parallel (one for each request, up to a couple of dozens or hundreds, depending on your configuration)
    • The MySQL server is multi-threaded, which means it can use several distinct cores to answer several concurrent requests -- even if each request cannot be served by more that one core.

So, in fact, your server's 8 core will end up being used ;-)


And, if you think your pages are taking too long to generate, a possible solution is to separate your calculations in two groups :

  • One one hand, the things that have to be done to generate the page : for those, there is not much you can do
  • On the other hand, the things that have to be run sometimes, but not necessarily immediatly
    • For instance, I am think about some statistics calculations : you want them to be quite up to date, but if they lag a couple of minutes behind, that's generally quite OK.
    • Same for e-mail sending : anyway, several minutes will pass before your users receive/read their mail, so there is no need to send them immediatly.

For the kind of situations in my second point, as you don't need those things done immediatly... Well, just don't do them immediatly ;-)
A solution that I often use is some queuing mecanism :

  • The web application store things in a "todo-list"
  • And that "todo-list" is de-queued by some batches that are run frequently via a cronjob

And for some other manipulations, you just want them run every X minutes -- and, here too, a cronjob is the perfect tool.

Pascal MARTIN
I like the queuing mechanism idea. How have you implemented this in PHP?
jkndrkn
The simplest idea that comes to mind is using a table in your database, inserting from the web-application (with some kind of "timestamp" column), and selecting and deleting the oldest lines from the batch run via cronjob ;;; other solutions would use specialized mecanisms (see http://framework.zend.com/manual/en/zend.queue.html for instance, or http://gearman.org/ )
Pascal MARTIN
Thank you for your thoughts and advice.
jkndrkn
You're welcome :-) Have fun !
Pascal MARTIN