views:

125

answers:

4

Why is using a parameterized query to insert data into a table:

string queryString = "insert into product(id, name) values (@id, @name)";

faster than appending the values to the query string:

string queryString = "insert into product(id, name) values (" + _id + ", " + _name + ")";

?

When I use the command in a loop to insert 10K rows, the parameterized query is an order of magnitude faster than the other one.

I know a parametrized query has security and maintainability benefits, and it's the recommended way to use, but now I'm interested in an explanation on why is it that much faster?

+3  A: 

This is due to the database caching the query plans, which makes it faster.

For Sql server see this explanation

Execution Plan Caching and Reuse

astander
+4  A: 

Depending on the DB you're using, the usual reason is because the parameterized query only has to be compiled once, and the dynamic query version is recompiled on each use.

RickNZ
That has never been the case on Oracle, and SQL Server finally supported cached dynamic query plans by using `sp_executesql` since v2005: http://www.sommarskog.se/dynamic_sql.html#queryplans
OMG Ponies
Automatic query plan caching of dynamic SQL in SQL Server only works with a single parameter (auto-parameterization); differences beyond that are considered as part of the key for the plan cache.
RickNZ
+3  A: 

Simple. Parsing and preparing execution plan for a query takes a lot of time even before query execution starts.

When you append parameters as text to query, each query is different, so DB needs to parse it and prepare execution plan.

When you use parameters, you are sending the same query many times (with different data) and DB can simply reuse execution plan from earlier call.

In most situation that's simply text comparison between queries. For example in MS SQL Server it's enough to change case of a letter or add a space at end of query to force DB to recreate execution plan.

Tomek Szpakowicz
+3  A: 

In general, the most expensive part of performing an SQL query is building the execution plan - identifying which tables are going to be needed, determining the best indexes (if any) to use, etc. You can think of this as "compiling" the query if you like.

When you use a parametrized query, you can prepare it once and then just plug in different target values. Since it's the same operation with different data, there's no need to rebuild the execution plan each time. To extend the "compiling" metaphor, this is like re-running the same program with a different configuration file.

When you append the values, though, you're hardcoding them into the query, so it has to be re-prepared each time and you incur the cost of building a new execution plan for each iteration. Again with the "compiling" metaphor, this is like a C program with all of its configuration hardcoded - change one setting, and you have to recompile the whole thing.

(The other major cost you can run into when doing mass inserts is updating the indexes. If your table is indexed, you might want to try turning them off, doing your inserts, and turning them back on so it only has to be reindexed once instead of after each row is added.)

Dave Sherohman
Building the execution plan is the most expensive part of performing a SQL query only in certain limited conditions -- definitely not in any general sense.
RickNZ
Do have links to any documentation regarding how limited those conditions are? I was, admittedly, just repeating what I've been told since it seemed plausible, but have never attempted to benchmark the breakdown of which piece of running a query takes longer, so I'd be interested in getting the actual facts. (Also, OP said he saw an order of magnitude runtime reduction from the parametrized version. That suggests rather strongly that, in this case, preparing the query *was* the most expensive part of the operation.)
Dave Sherohman
It's not uncommon for a query to take a minute or longer to run; some queries can take 24 hours+. Compilation normally takes well under a second. Of course, if your compiled query executes in milliseconds, then compilation can become a significant fraction of the total execution time, but it's certainly not true in the general case as your answer implies.
RickNZ
Ah, fair enough. You're right that I hadn't considered long-running queries; thanks for catching that. On the other hand, though, my experience has been that the vast majority of queries run by the typical application are in the milliseconds range (such as the OP's simple INSERT), which makes their performance characteristics true "in general" (which does have somewhat different semantics than talking about "the general case"). What do you think about a change from "in general" to "for simple queries"? Is that reasonably accurate? Or I'm open to other suggestions for phrasing.
Dave Sherohman