ansaurus

Question

PHP PDO: how does re-preparing a statement affect performance

Answer 1

A:

To my knowledge PDO does not reuse already prepared statements as it does not analyse the query by itself so it does not know if it is the same query.

If you want to create a cache of prepared queries, the simplest way imho would be to md5-hash the query string and generate a lookup table.

OTOH: How many queries are you executing (per minute)? If less than a few hundred then you only complicate the code, the performance gain will be minor.

dbemerlin 2010-01-25 13:31:05

I may be missing something, but why create the md5 of the query string? Why not use the query string _itself_?

Inshallah 2010-01-25 13:34:10

because the query string itself might be quite long, so a lookup would be slower. A hash will allow quicker lookup (of course if the queries are short enough and the lookups are few enough then it might actually be slower but in the usual case a hash search is faster)

dbemerlin 2010-01-25 13:37:08

What? MD5ing the query string for performance? That's completely insane :-). I've seen the indexing by query string method used in Perl, to good effect. Are you sure you know enough about the way PHP arrays are implemented to make such a recommendation?

Inshallah 2010-01-25 13:39:48

It _might_ be unnecessary for PHP as i haven't done any benchmarking for this specific case but imho it is always a good idea to use hash-values for lookups even if the array itself hashes as you otherwise might be tempted to do something like $key = <long query>; foreach ($stuff as $item) { $cache[$key]->do_stuff($item); } which will force PHP to rehash the query for each loop iteration to lookup the key. OTOH i usually don't care much about how the language itself implements some things so i might be wrong there: You should Benchmark it and find out.

dbemerlin 2010-01-25 14:01:02

Answer 2

+1 A:

Believe me, I've done this before and after building a cache of prepared statements the performance gain was very noticeable - see this question: Preparing SQL Statements with PDO.

An this was the code I came up after, with cached prepared statements:

function DB($query)
{
    static $db = null;
    static $result = array();

    if (is_null($db) === true)
    {
        $db = new PDO('sqlite:' . $query, null, null, array(PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING));
    }

    else if (is_a($db, 'PDO') === true)
    {
        $hash = md5($query);

        if (empty($result[$hash]) === true)
        {
            $result[$hash] = $db->prepare($query);
        }

        if (is_a($result[$hash], 'PDOStatement') === true)
        {
            if ($result[$hash]->execute(array_slice(func_get_args(), 1)) === true)
            {
                if (stripos($query, 'INSERT') === 0)
                {
                    return $db->lastInsertId();
                }

                else if (stripos($query, 'SELECT') === 0)
                {
                    return $result[$hash]->fetchAll(PDO::FETCH_ASSOC);
                }

                else if ((stripos($query, 'UPDATE') === 0) || (stripos($query, 'DELETE') === 0))
                {
                    return $result[$hash]->rowCount();
                }

                else if (stripos($query, 'REPLACE') === 0)
                {
                }

                return true;
            }
        }

        return false;
    }
}

Since I don't need to worry about collisions in queries, I've ended up using md5() instead of sha1().

Alix Axel 2010-01-25 13:58:30

I'm very interested to know about why you made the decision to use MD5 hashing. I've posted a naive benchmark of MD5 hashing vs PHP native arrays, and could find any improvement with the MD5 method.

Inshallah 2010-01-25 14:14:20

@Inshallah: The decision to use `md5()` was just so I wouldn't have new lines (`\n`) in queries.

Alix Axel 2010-01-25 14:27:17

Answer 3

+2 A:

MySQL (like most DBMS) will cache execution plans for prepared statements, so if user A creates a plan for:

SELECT * FROM some_table WHERE a_col=:v1 AND b_col=:v2

(where v1 and v2 are bind vars) then sends values to be interpolated by the DBMS, then user B sends the same query (but with different values for interpolation) the DBMS does not have to regenerate the plan. i.e. its the DBMS which finds the matching plan - not PDO.

However this means that each operation on the database requires at least 2 round trips (1st to present the query, the second to present the bind vars) as opposed to a single round trip for a query with literal values, then this introduces additional network costs. There is also a small cost involved in dereferencing (and maintaining) the query/plan cache.

The key question is whether this cost is greater than the cost of generating the plan in the first place.

While (in my experience) there definitely seems to be a performance benefit using prepared statements with Oracle, I'm not convinced that the same is true for MySQL - however a lot will depend on the strucutre of your database and the complexity of the query (or more specifically how many different options the optimizer can find for resolving the query).

Try measuring it yourself (hint you might want to set the slow query threshold to 0 and write some code to convert literal values back into anonymous representations for the queries written to the logs)

C.

symcbean 2010-01-25 13:58:57

Answer 4

+1 A:

OK, since I've been bashing methods of keying the queries for the cache, other than simply using the query string itself, I've done a naive benchmark. The following compares using the plain query string vs first creating the md5 hash:

$ php -v
$ PHP 5.3.0-3 with Suhosin-Patch (cli) (built: Aug 26 2009 08:01:52)
$ ...
$ php benchmark.php
$ PHP hashing: 0.19465494155884 [microtime]
$ MD5 hashing: 0.57781004905701 [microtime]
$ 799994

The code:

<?php
error_reporting(E_ALL);

$queries = array("SELECT",
                 "INSERT",
                 "UPDATE",
                 "DELETE",
                 );
$query_length = 256;
$num_queries  = 256;
$iter = 10000;

for ($i = 0; $i < $num_queries; $i++) {
    $q = implode('',
           array_map("chr",
             array_map("rand",
                       array_fill(0, $query_length, ord("a")),
                       array_fill(0, $query_length, ord("z")))));
    $queries[] = $q;
}

echo count($queries), "\n";

$cache = array();
$side_effect1 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        if (!isset($cache[$q])) {
            $cache[$q] = $q;
        }
        else {
            $side_effect1++;
        }
    }
}
echo microtime(true) - $t, "\n";

$cache = array();
$side_effect2 = 0;
$t = microtime(true);
for ($i = 0; $i < $iter; $i++) {
    foreach ($queries as $q) {
        $md5 = md5($q);
        if (!isset($cache[$md5])) {
            $cache[$md5] = $q;
        }
        else {
            $side_effect2++;
        }
    }
}
echo microtime(true) - $t, "\n";

echo $side_effect1 + $side_effect2, "\n";

Inshallah 2010-01-25 14:11:41

Thanks for the benchmark, sadly i couldn't get it to work here as $cache never had any data and $q was only "DELETEArray" but i will investigate this further as soon as i have the opportunity

dbemerlin 2010-01-25 14:20:42

@dbemerlin, yes, sorry. It should work now. I edited the code a bit after copy/pasting from emacs, introducing some errors in the process :-).

Inshallah 2010-01-25 14:36:35

I tested it now based on your script with those modifications: create 100 random strings by concatenating 20 times a random string from $queries. foreach of these random strings loop $iter times and each time check if the string exists in $cache, if not set it. for md5: foreach of the random strings create the hash and then loop $iter times and do the same. Result: with $iter=1 the md5 hash is 0.00008ms slower, for low values of $iter (100 and lower) there is no real performance difference (<0.1s), with increasing amount of iterations md5 gains more advantage.So: Depends on use-case.

dbemerlin 2010-01-25 14:37:00

I ran Instalah's benchmark with varied `$length` and `$iter` values and it seems PHP's caching wins every time. I guess I'll be accepting his answer as he was right in the end having pointed out the final solution (use whole queries as keys in cache) in the comments of the original question. When I finish the wrapper class, I'll try to post further benchmarks whether it was worth caching statements at all (can't promise anything though :)

Raveren 2010-01-25 14:50:56

@dbemerlin, thanks for your feedback. I pumped up the number of queries to 256 with some really long ones but, for me, MD5 consistently lost.

Inshallah 2010-01-25 14:59:03

I'd recommend benchmarking with md5-strings, too, as they might (or might not) be faster for your use case, especially if you have the possibility to reuse the hash (i.e. in a loop) and they might (or might not) be faster if you have more than the 4 different queries in inshallahs benchmark.

dbemerlin 2010-01-25 15:03:02

You guys are aware associative arrays are implemented as /hashmaps/? Since the resulting md5 hash is then again hashed with whatever PHP uses internally to maintain it's hashmaps there can only be a performance gain in using md5 upfront if a) the PHP internal hashing is of lower performance than the exposed md5 implementation (including all the overhead of calling the exposed md5 function, allocating enough memory on the heap for the resulting md5 hash, etc) and b) the input to md5 is significantly larger than a md5 hash - no matter what (because the resulting md5 will be hashed again).

Daniel Baulig 2010-10-15 13:23:59

@Daniel I have no knowledge whatsoever about how PHP implements associative arrays, but I thought it was something like that. :-)

Inshallah 2010-10-15 14:20:06

Answer 5

A:

Using a MD5 hash as a key you could eventually get two queries that result in the same MD5 hash. The probability is not high, but it could happen. Don't do it. Lossful hashing algorithms like MD5 is just ment as a way to tell if two objects are different with high certainty, but are not a safe means of identifying something.

Daniel Alvarez Arribas 2010-09-27 14:35:29

ansaurus

tags:

views:

answers:

PHP PDO: how does re-preparing a statement affect performance

related questions