views:

106

answers:

3

Jeff Atwood wrote once he found quering database for primary keys, and then getting all relevant fields with an IN clause is double as quick as its single-sql counterpart.

I wonder if this applies to all situations, and if not, what are the cases when it still provides significant room for improvement in terms of performance?

Furthermore, how expensive is it to access db via a scripting language library? I'm mostly talking about the very famous PHP-MySQL combination.

+3  A: 

It depends. Sometimes, as Jeff's blog post clearly indicates, it can provide a (significant) performance boost. But as a general rule, it's better to let the query optimizer find the best execution plan it can, and then try to manually optimize particularly slow queries.

From the article, "We default to the built-in Linq language constructs, and drop down to hand-tuning ye olde SQL blobs where the performance traces tell us we need to." Likewise, you should default to the query optimizer doing what it does, and drop down to hand tuning your SQL statements where the performance traces tell you you need to.

Connecting to a database engine from a scripting language is generally very fast. Usually the actual execution of the queries will take far longer than actually connecting to the database server and moving the results back from the database server to the requesting script.

James McNellis
Thanks, very insightful response. What do you think - this trick is more useful in scenarios where amount of columns is higher, or when amount of records is higher?
pestaa
I honestly don't know. What Jeff describes is rather perverse behavior for a database engine; once you have your record pointers from the index, selecting n records should, on average, never take more than n times as long as selecting a single record.
James McNellis
@James When you say connecting to a database is fast, do you mean that a connection pool is used, so the cost is almost eliminated?
martin clayton
I mean "fast" as in "fast relative to the time it takes to execute queries."
James McNellis
A: 

Retrieving data via a key is always going to be quicker when grabbing data from a table. It's just how databases work; grabbing indexed data is quicker than grabbing non-indexed data. And getting only the key can be faster since all the DB engine has to do is "unroll" data from the index into a result set.

As for your "expensive" question, I'm assuming you mean "is it slow". I've not found that to be the case. One of the most expensive, from a computation standpoint, parts of a query is opening the connection and most (if not all) modern databases use some form of connection caching, so it's not that expensive any more. As for the queries themselves, the only real overhead is going to be network latency, so you should see queries take about the same time or not much longer than if you were querying from a non-scripting language (milliseconds, in other words).

Michael Todd
Getting only the key is definitely faster, but you have to buy another return ticket to db for the data as well.
pestaa
+2  A: 

Jeff Atwood is talking about SQL Server, not MySQL. SQL optimisations are notoriously dependent on the DBMS, the configuration, the query, the data, and the state of the cache. Other than saying that selecting just the primary key fields will at least as fast as selecting the entire row, it's hard to generalise. Certainly it's hard to generalise to any degree that would be useful. You'll have to benchmark your particular case.

Based on my experience with MySQL, I'd be surprised if selecting the details with an IN query were faster than doing a SELECT * in the first place. My understanding is that SELECT * is more expensive than SELECT id because MySQL has to look up the index data in both cases, but in the former case has to do the additional step of fetching the data that constitutes the rest of the row, which may require further disk seeks (especially since the table data is less likely to be in the cache than the index). However, with an InnoDB clustered index (as the primary key will be if you're using InnoDB) there is a special case that the data is stored alongside the index entry in the clustered index. In this case, I believe the SELECT * will be almost the same speed as SELECT id.

Tim
Looking at your reputation it feels that there are still some fascinating dev guys out there. I'm glad you joined, welcome to Stack Overflow.
pestaa
Thanks for the welcome! I find it hard to answer questions on StackOverflow, as usually someone else answers before I finish typing! There's a lot of smart guys here.
Tim