views:

415

answers:

8

Nowadays, "Prepared statements" seem to be the only way anyone recommends sending queries to a database. I even see recommendations to use prepared statements for stored procs. However, do to the extra query prepared statements require - and the short time they last - I'm persuaded that they are only useful for a line of INSERT/UPDATE queries.

I'm hoping someone can correct me on this, but it just seems like a repeat of the whole "Tables are evil" CSS thing. Tables are only evil if used for layouts - not tabular data. Using DIV's for tabular data is a style violation of WC3.

Like wise, plain SQL (or that generated from AR's) seems to be much more useful for 80% of the queries used, which on most sites are a single SELECT not to be repeated again that page load (I'm speaking about scripting languages like PHP here). Why would I make my over-taxed DB prepare a statement that it is only to run once before being removed?

MySQL:

A prepared statement is specific to the session in which it was created. If you terminate a session without deallocating a previously prepared statement, the server deallocates it automatically.

So at the end of your script PHP will auto-close the connection and you will lose the prepared statement only to have your script re-created it on the next load.

Am I missing something or is this just a way to decrease performance?

:UPDATE:

It dawned on me that I am assuming new connections for each script. I would assume that if a persistent connection is used then these problems would disappear. Is this correct?

:UPDATE2:

It seems that even if persistent connections are the solution - they are not a very good option for most of the web - especially if you use transactions. So I'm back to square one having nothing more than the benchmarks below to go on...

:UPDATE3:

Most people simply repeat the phrase "prepared statements protect against SQL injection" which doesn't full explain the problem. The provided "escape" method for each DB library also protects against SQL injection. But it is more than that:

When sending a query the normal way, the client (script) converts the data into strings that are then passed to the DB server. The DB server then uses CPU power to convert them back into the proper binary datatype. The database engine then parses the statement and looks for syntax errors.

When using prepared statements... the data are sent in a native binary form, which saves the conversion-CPU-usage, and makes the data transfer more efficient. Obviously, this will also reduce bandwidth usage if the client is not co-located with the DB server.

...The variable types are predefined, and hence MySQL take into account these characters, and they do not need to be escaped.

http://www.webdesignforums.net/showthread.php?t=18762

Thanks to OIS for finally setting me strait on this issue.

+4  A: 

unlike the CSS tables debate, there are clear security implications with prepared statements.

if you use prepared statements as the ONLY way to put user-supplied data in to a query, then they are absolutely bullet-proof when it comes to SQL injection.

longneck
Using the escape()/quote() method provided by the DB in your queries is bullet-proof also.
Xeoncross
Except when these values are passed into stored procedures which do not prepare statements (or otherwise sanitize their input)
Vinko Vrsalovic
@Xeoncross: that needs more explaining. Prepared statements is 100% secure because there is no escaping/quoting/conversion. You are not talking about mysql_real_escape_string right?
OIS
@OSI Actually, I didn't think anyone still used mysql_real_escape_string() for current projects. I was referring to modern methods like PDO::quote().
Xeoncross
@Xenocross: same thing. "If you are using this function to build SQL statements, you are strongly recommended to use PDO::prepare() to prepare SQL statements with bound parameters instead of using PDO::quote() to interpolate user input into a SQL statement. Prepared statements with bound parameters are not only more portable, more convenient, immune to SQL injection, but are often much faster to execute than interpolated queries, as both the server and client side can cache a compiled form of the query. Not all PDO drivers implement this method. Consider using prepared statements instead. "
OIS
That is nice, but it didn't actually say problems, just "Consider using prepared statements" PDO::quote() works with all major DB's which is all I will be using. And as far as I'm aware, it runs the same API cleanup that the prepare runs leaving no difference when it comes to *escaping* data.
Xeoncross
Prepare doesnt run a clean up. Get with the program. It transmits the data as data, not strings. There is no quoting. Which is __WHY__ prepared statements are _100% safe_ and quoted strings are __NOT__.
OIS
Thank you! This is the kind of answer I have been looking for, I just need some more info to backup what you say.
Xeoncross
The best source is of course mysql in your case. http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html
OIS
A: 

When using sql queries like SELECT x,y,z FROM foo WHERE c='mary had a little lamb' the server has to parse the sql statement including the data + you have to sanitize the "mary had..." part (a call to mysql_real_escape() or similar for each parameter). Using prepared statements the server has to parse the statement, too, but without the the data and sends back only an identifier for the statement (a tiny tiny data packet). Then you send the actual data without first sanitizing it. I don't see the overhead here, though I freely admit I've never tested it. Have you? ;-)

edit: And using prepared statements can eliminate the need to convert each and every parameter (in/out) to strings. Probably even more so if your version of php uses mysqlnd (instead of the "old" libmysql client library). Haven't tested the performance aspect of that either.

VolkerK
Good point, I'll try to throw together a test.
Xeoncross
The overhead lies in the fact you have to send two requests to the sever (query _then_ query_id + data) vs just one transfer of (query + data).
Xeoncross
But on the other hand you eliminate the escaping routines and the data can be send in another (possibly "better") format. And it's still the same connection (usually within the bounds of the same LAN). Creating a new connection may be costly. But just sending one or two packets more forth and back between the server and client ...how costly is that? In a real world scenario. My _guess_ is you need quite an elaborate test to account for all the factors involved (server configuration, buffers, latency, ...many more). All I can say is that I haven't noticed a _significant_ performance lost yet.
VolkerK
+1  A: 

When you execute a sql statement on the database, the sql parser needs to analyse it beforehand, which is the exact same process as the preparation.

So, comparing executing sql statements directly to preparing and executing has no disadvantages, but some advantages:

  • First of all, as longneck already stated, passing user input into a prepared statement escapes the input automatically. It is as if the database has prepared filters for the values and lets in only those values that fit.

  • Secondly, if use prepared statements thoroughly, and you come in the situation where you need to execute it multiple times, you don't need to rewrite the code to prepare and execute, but you just execute it.

  • Thirdly: The code becomes more readable, if done properly:


$sql = 'SELECT u.id, u.user, u.email, sum(r.points)
        FROM users u
        LEFT JOIN reputation r on (u.id=r.user_id)
        LEFT JOIN badge b on (u.id=b.user_id and badge=:badge)
        WHERE group=:group';

$params = array(
    ':group' => $group, 
    ':badge' => $_GET['badge']
);

$stmt = $pdo->prepare($sql);
$result = $stmt->execute($params);


Instead of


$sql = 'SELECT u.id, u.user, u.email, sum(r.points)
        FROM users u
        LEFT JOIN reputation r on (u.id=r.user_id)
        LEFT JOIN badge b on (u.id=b.user_id and badge="'.mysql_real_escape_string($_GET['badge']).'")
        WHERE group="'.mysql_real_escape_string($group).'"';

$result = mysql_query($sql);


Imagine you had to change the sql statement, which code would be your favourite? ;-)

Cassy
Actually, the second one would be - but with a call to $this->escape() instead of that ugly function. However, like I stated above, should a query actually *need* to be run more than once I agree that PS are the way to go. However, for a query that is only run once you encounter a loss of performance from that extra db call.
Xeoncross
A: 

I don't seem to be finding any good benefits to use persistent connections - or prepared statements for that mater. Look at these numbers - for 6000 select statements (which will never happen in a page request!) you can barely tell the difference. Most of my pages use less than 10 queries.

UPDATED I just revised my test to include 4k SELECT and 4k INSERT statements! Run it yourself and let me know if there are any design errors.

Perhaps the difference would be greater if my MySQL server wasn't running on the same machine as Apache.

Persistent: TRUE
Prepare: TRUE
2.3399310112 seconds

Persistent: FALSE
Prepare: TRUE
2.3265211582184 seconds

Persistent: TRUE
Prepare: FALSE
2.3666892051697 seconds

Persistent: FALSE
Prepare: FALSE
2.3496441841125 seconds

Here is my test code:

$hostname = 'localhost';
$username = 'root';
$password = '';
$dbname = 'db_name';

$persistent = FALSE;
$prepare = FALSE;

try 
{

    // Force PDO to use exceptions for all errors
    $attrs = array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION);

    if($persistent) 
    { 
     // Make the connection persistent
     $attrs[PDO::ATTR_PERSISTENT] = TRUE;
    }

    $db = new PDO("mysql:host=$hostname;dbname=$dbname", $username, $password, $attrs);

    // What type of connection?
    print 'Persistent: '.($db->getAttribute(PDO::ATTR_PERSISTENT) ? 'TRUE' : 'FALSE').'<br />';
    print 'Prepare: '.($prepare ? 'TRUE' : 'FALSE').'<br />';

    //Clean table from last run
    $db->exec('TRUNCATE TABLE `pdo_insert`');

}
catch(PDOException $e)
{
    echo $e->getMessage();
}

$start = microtime(TRUE);

$name = 'Jack';
$body = 'This is the text "body"';

if( $prepare ) {

    // Select
    $select = $db->prepare('SELECT * FROM pdo_insert WHERE id = :id');
    $select->bindParam(':id', $x);

    // Insert
    $insert = $db->prepare('INSERT INTO pdo_insert (`name`, `body`, `author_id`) 
    VALUES (:name, :body, :author_id)');
    $insert->bindParam(':name', $name);
    $insert->bindParam(':body', $body);
    $insert->bindParam(':author_id', $x);


    $run = 0;
    for($x=0;$x<4000;++$x) 
    {
     if( $insert->execute() && $select->execute() ) 
     {
      $run++;
     }
    }

}
else
{

    $run = 0;
    for($x=0;$x<4000;++$x) {

     // Insert
     if( $db->query('INSERT INTO pdo_insert (`name`, `body`, `author_id`) 
     VALUES ('.$db->quote($name).', '. $db->quote($body).', '. $db->quote($x).')') 

     AND

     // Select
     $db->query('SELECT * FROM pdo_insert WHERE id = '. $db->quote($x)) )
     {
      $run++;
     }

    }

}





print (microtime(true) - $start).' seconds and '.($run * 2).' queries';
Xeoncross
Id like to see how you get 100% bullettproof non-voulnarable to sql injection data without prepared statements.
OIS
By using the same internal method as the DB API uses for prepared statements: $db->quote($x)
Xeoncross
See other answer. PDO->quote is nowhere near 100% sql injection safe like prepared statements. Nor can it be used on all databases (which you never need, until you _need_ it, _now_).
OIS
If you think the DB just quotes strings when doing prepared statements I have a bridge to sell you.
Donnie
Sorry, OIS took all my money
Xeoncross
A: 

Prepared Statements come in handy in several situations:

  • Great separation of query data from untrusted user data.
  • Performance increase when the same query is executed multiple times
  • Performance increase when binary data is being transmitted as the prepared statement can use the binary protocol, whereas a traditional query will end up doing encoding and such.

There is a performance hit under normal circumstances (not repeated, no binary data) as you now have to do two back and forths. The first to "prepare" the query, and the second to transmit the token along with the data to be inserted. Most people are willing to make this sacrifice for the security benefit.

With regards to persistent connections: MySQL has one of the fastest connection build up times on the market. It's essentially free for most set ups, so you're not going to see too much of a change using persistent connections or not.

preinheimer
query data and user data should be separated wither you use prepared statements or not. @Cassy example with $_GET is very bad form. As to the performance increase - if you see my benchmarks it's hard to call one method slower than another (though please do if i'm missing something). However, that binary data transfer is a great plus that I haven't thought of - I wonder if it PDO makes use of it...?
Xeoncross
A: 

Cassy is right. If you don't prepare/compile it, the dbms would have to in any case before able to run it.

Also, the advantage is you could check the prepare result and if prepare fail your algo can branch off to treat an exception without wasting db resources to run the failing query.

Blessed Geek
Your second point is an interesting idea, however it would only be useful while you we're debugging/building your project (before you have tested the SQL syntax) and would only save you on bandwidth. Of course, in *dev mode* bandwidth for errors hardly counts for anything. ;)
Xeoncross
I have to write web apps where sql statements are dynamically generated.
Blessed Geek
+1  A: 

The answer has to do with security and abstraction. Everyone else has already mentioned security, but the real upside is that your input is completely abstracted from the query itself. This allows for a true database agnosticism when using an abstraction layer, whereas inlining the input is usually a database-dependent process. If you care anything for portability, prepared statements are the way to go.

In the real world, I rarely ever write DML queries. All of my INSERTS / UPDATES are automatically built by the abstraction layer and are executed by simply passing an input array. For all intents and purposes, there really is no "performance hit" for preparing queries and then executing them (save for connection latency in the initial PREPARE). But when using a UDS (Unix Domain Socket) connection, you're not going to notice (or even be able to benchmark) a difference. It's usually on the order of a few microseconds.

Given the security and abstraction upsides, I'd hardly call it wasteful.

Kenaniah
Actually, with the advent of ActiveRecord (and other fun ORM's) along with procs - I don't think most people actually writes SQL anymore. So whether your abstraction layer generates a SQL string and then runs a prepare statement - or just skips straight to running it - will makes little difference to your code. Also, I'm not sure why the topic of "security" keeps popping up. Using the sanctioned escape() method insures that the same code which would run on the prepared statement runs on your scalar value.
Xeoncross
Sorry, I was off about that security/quoting thing. Now that I have been corrected about that I would say that is (from everything I've read so far) the only reason to use PS as it also saves string/binary conversion time.
Xeoncross
Most people that are developing complex sql for reporting, data warehousing, business intelligence, (and to a lesser extent transactional apps) are either writing their own SQL or using an OLAP tool like Microstrategy or Cognos to generate the SQL. ORMs aren't helpful in this space.
KenFar
+1  A: 

The performance benefit doesn't come from less parsing - it comes from only having to calculate access paths once rather than repeatedly. This helps a lot when you're issuing thousands of queries.

Given mysql's very simple optimizer/planner this may be less of an issue than with a more mature database with much more sophisticated optimizers.

However, this performance benefit can actually turn into a detriment if you've got a sophisticated optimizer that is aware of data skews. In that case you can often be better off with getting a different access path for the same query using different literal values rather than reusing a preexisting path.

KenFar
I take it that an example of this might be Postgre?
Xeoncross
really, pretty much any of the more sophisticated databases - oracle, db2, sybase, sql server, informix, postgresql, etc.
KenFar