views:

306

answers:

9

I have a PHP script that acts as a JSON API to my backend database.

Meaning, you send it an HTTP request like: http://example.com/json/?a=1&b=2&c=3... it will return a json object with the result set from my database.

PHP works great for this because it's literally about 10 lines of code.

But I also know that PHP is slow and this is an API that's being called about 40x per second at times and PHP is struggling to keep up.

Is there a way that I can compile my PHP script to a faster executing format? I'm already using PHP-APC which is a bytecode optimization for PHP as well as FastCGI.

Or, does anyone recommend a language I rewrite the script in so that Apache can still process the example.com/json/ requests?

Thanks

UPDATE: I just ran some benchmarks:

  • PHP script takes 0.6 second to complete
  • If I use the generated SQL from the PHP script above and run the query from the same web server but directly from within the MySQL command, meaning, network latency is still in play - the fetched result set takes only 0.09 seconds to complete.

As you notice, PHP is literally 1 order of magnitude slower in generating the results. Network does not appear to be the major bottleneck in this case, though I agree it typically is the root cause.

+7  A: 

Before you go optimizing something, first figure out if it's a problem. Considering it's only 10 lines of code (according to you) I very much suspect you don't have a problem. Time how long the script takes to execute. Bear in mind that network latency will typically dwarf trivial script execution times.

In other words: don't solve a problem until you have a problem.

You're already using an opcode cache (APC). It doesn't get much faster than that. More to the point, it rarely needs to get any faster than that.

If anything you'll have problems with your database. Too many connections (unlikely at 20x per second), too slow to connect or the big one: query is too slow. If you find yourself in this situation 9 times out of 10 effective indexing and database tuning is sufficient.

In the cases where it isn't is where you go for some kind of caching: memcached, beanstalkd and the like.

But honestly 20x per second means that these solutions are almost certainly overengineering for something that isn't a problem.

cletus
Agreed. If you're returning results from a DB query, perhaps it's the query itself that is slow? Try profiling the query on the SQL server.
Amber
I ran a benchmark, see my updated post above
Tedi
Good comment. In short for faster executing format APC APC APC APC
Alfred
A: 

You're already using APC opcode caching which is good. If you find you're still not getting the performance you need, here are some other things you could try:

1) Put a Squid caching proxy in front of your web server. If your requests are highly cacheable, this might make good sense.

2) Use memcached to cache expensive database lookups.

Asaph
I ran a benchmark, see my updated post above. Network and I/O does not appear in this case to be the bottleneck
Tedi
+1  A: 

The first rule of optimization is to make sure you actually have a performance problem. The second rule is to figure out where the performance problem is by measuring your code. Don't guess. Get hard measurements.

PHP is not going to be your bottleneck. I can pretty much guarantee that. Network bandwidth and latency will dwarf the small overhead of using PHP vs. a compiled C program. And if not network speed, then it will be disk I/O, or database access, or a really bad algorithm, or a host of other more likely culprits than the language itself.

John Kugelman
I ran a benchmark, see my updated post above. Network and I/O does not appear in this case to be the bottleneck
Tedi
A: 

Consider that if you're handling database updates, your MySQL performance is what, IMO, needs attention. I would expand the test harness like so:

  • run mytop on the dbserver
  • run ab (apache bench) from a client, like your desktop
  • run top or vmstat on the webserver

And watch for these things:

  • updates to the table forcing reads to wait (MyISAM engine)
  • high load on the webserver (could indicate low memory conditions on webserver)
  • high disk activity on webserver, possibly from logging or other web requests causing random seeking of uncached files
  • memory growth of your apache processes. If your result sets are getting transformed into large associative arrays, or getting serialized/deserialized, these can become expensive memory allocation operations. Your code might need to avoid calls like mysql_fetch_assoc() and start fetching one row at a time.

I often wrap my db queries with a little profiler adapter that I can toggle to log unusually query times, like so:

function query( $sql, $dbcon, $thresh ) {
    $delta['begin'] = microtime( true );
    $result = $dbcon->query( $sql );
    $delta['finish'] = microtime( true );
    $delta['t'] = $delta['finish'] - $delta['begin'];
    if( $delta['t'] > $thresh )
        error_log( "query took {$delta['t']} seconds; query: $sql" );
    return $result;
}

Personally, I prefer using xcache to APC, because I like the diagnostics page it comes with.

Chart your performance over time. Track the number of concurrent connections and see if that correlates to performance issues. You can grep the number of http connections from netstat from a cronjob and log that for analysis later.

Consider enabling your mysql query cache, too.

memnoch_proxy
+1  A: 

If your database is very read-heavy (I'm guessing it is) then a basic caching implementation would help, and memcached would make it very fast.

Let me change your URL structure for this example:

/widgets.json?a=1&b=2&c=3

For each call to your web service, you'd be able to parse the GET arguments and use those to create a key to use in your cache. Let's assume you're querying for widgets. Example code:

<?
// a function to provide a consistent cache key for your resource
function cache_key($type, $params = array()){
 if(empty($type)){
  return false;
 }
 // order your parameters alphabetically by key.
 ksort($params);
 return sha1($type . serialize($params));
}

// you get the same cache key no matter the order of parameters
var_dump(cache_key('widgets', array('a' => 3, 'b' => 7, 'c' => 5)));
var_dump(cache_key('widgets', array('b' => 7, 'a' => 3, 'c' => 5)));


// now let's use some GET parameters.
// you'd probably want to sanitize your $_GET array, however you want.
$_GET = sanitize($_GET);

// assuming URL of /widgets.json?a=1&b=2&c=3 results in the following func call:
$widgets_cache_key = cache_key('widgets', $_GET);

// connect to memcache (requires memcache pecl module)
$m = new Memcache;
$m->connect('127.0.0.1', 11211);

// try to get data from cache
$data = $m->get($widgets_cache_key);
if(empty($data)){
 // data is not in cache. grab it.
 $r = mysql_query("SELECT * FROM widgets WHERE ...;");
 while($row = mysql_fetch_assoc($r)){
  $data[] = $row;
 }
 // now store data for next time.
 $m->set($widgets_cache_key, $data);
}

var_dump(json_encode($data));
?>
patcoll
+3  A: 

I've had a lot of luck with using PHP, memcached and nginx's memcache module together for very fast results. The easiest way is to just use the full URL as the cache key

I'll assume this URL:

/widgets.json?a=1&b=2&c=3

Example PHP code:

<?
$widgets_cache_key = $_SERVER['REQUEST_URI'];

// connect to memcache (requires memcache pecl module)
$m = new Memcache;
$m->connect('127.0.0.1', 11211);

// try to get data from cache
$data = $m->get($widgets_cache_key);
if(empty($data)){
    // data is not in cache. grab it.
    $r = mysql_query("SELECT * FROM widgets WHERE ...;");
    while($row = mysql_fetch_assoc($r)){
     $data[] = $row;
    }
    // now store data for next time.
    $m->set($widgets_cache_key, $data);
}

var_dump(json_encode($data));
?>

That in itself provides a huge performance boost. If you were to then use nginx as a front-end for Apache (put Apache on 8080 and nginx on 80), you could do this in your nginx config:

worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include  mime.types;
    default_type  application/octet-stream;

    access_log  off;
    sendfile  on;
    keepalive_timeout  5;
    tcp_nodelay  on;
    gzip  on;

    upstream apache {
        server  127.0.0.1:8080;
    }

    server {
        listen  80;
        server_name  _;

        location / {
            if ($request_method = POST) {
                proxy_pass  http://apache;
                break;
            }
            set  $memcached_key $uri;
            memcached_pass  127.0.0.1:11211;
            default_type  text/html;
            proxy_intercept_errors  on;
            error_page  404 502 = /fallback;
        }

        location /fallback {
            internal;
            proxy_pass  http://apache;
            break;
        }
    }
}

Notice the set $memcached_key $uri; line. This sets the memcached cache key to use REQUEST_URI just like the PHP script. So if nginx discovers a cache entry with that key it will serve it directly from memory, and you never have to touch PHP or Apache. Very fast.

There is an unofficial Apache memcache module as well. Haven't tried it but if you don't want to mess with nginx this may help you as well.

patcoll
Why would just recommend using md5 or sha1 on the memcache key.
AntonioCS
A: 

Please see this question. You have several options. Yes, PHP can be compiled to native ELF (and possibly even FatELF) format. The problem is all of the Zend creature comforts.

Tim Post
A: 

Since you already have APC installed, it can be used (similar to the memcached recommendations) to store objects. If you can cache your database results, do it! http://us2.php.net/manual/en/function.apc-store.php http://us2.php.net/manual/en/function.apc-fetch.php

philfreo
A: 

From your benchmark it looks like the php code is indeed the problem. Can you post the code?

What happens when you remove the MySQL code and just put in a hard-coded string representing what you'll get back from the db?

Since it takes .60 seconds from php and only .09 seconds from a MySQL CLI I will guess that the connection creation is taking too much time. PHP creates a new connection per request by default and that can be slow sometimes.

Think about it, depending on your env and your code you will:

  1. Resolve the hostname of the MySQL server to an IP
  2. Open a connection to the server
  3. Authenticate to the server
  4. Finally run your query

Have you considered using persistent MySQL connections or connection pooling? It effectively allows you to jump right to query step from above.

Caching is great for performance as well. I think others have covered this pretty well already.

jckdnk111