ansaurus

Question

When optimizing database queries, what exactly is the relationship between number of queries and size of queries?

Answer 1

+1 A:

If you're looking for a rule of thumb: whenever possible, filter, sort and page in the database query. The database is optimized for these kinds of operations (set operations).

Application code is best reserved for true business logic (and display logic, etc.).

Phil Sandler 2010-06-15 16:12:01

Answer 2

+2 A:

I'm firmly in the "get only what you need when you need it" camp.

Retrieving extra rows that you may or may not need (lets say, retrieving full order details when loading an order summary screen, just in case the user drills down) just results in a much more complex query, probably joining tables that won't be used most of the time.

As a DBA, the hardest queries to optimize are the ones that join a large number of tables together.

Retrieving extra columns isn't quite as bad, but sometimes the server can retrieve just a few key columns directly from a "covering index" rather than having to retrieve all columns from a base table.

I think the key to the advice you've heard is to not make unnecessary round trips when you can get it all at once, instead of what it sounds like you are saying "get extra data just in case you need it".

Developers are so used to "modularizing" everything, its not at all unusual to actually end up with a final web page that makes hundreds or even multiple thousands of calls to the database to load the web page just once. We have a commercial product in-house that we've measured makes over 50,000 calls to the database for a single action.

For an example (somewhat contrived), lets say you have an "Order Summary" page that includes an "order total" field, which is the sum of all items in the "Order Detail" table. The wrong approach is:

Retrieve the list of orders from the Order Header table
Programatically loop through the orders
For each order, execute a query to retrieve all order detail records
Programatically add up the order items to get the total, which is displayed in the grid

Sounds crazy, right? This is more common than you think, especially when you build data-bound logic into individual web-components. Much more efficient:

Make a single call to the database, which a query something like:

SELECT oh.OrderID, oh.OrderDate, SUM(od.LineTotal) as OrderTotal
FROM OrderHeader oh
INNER JOIN OrderDetail od on oh.OrderID = od.OrderID

Display the results in the grid.

BradC 2010-06-15 16:27:25

So if I'm hearing you right, in terms of speed, the cost of transferring data generally far dominates the cost of additional queries? So in a contrived example, let's say I have a batch of 100 decently sized objects I want to retrieve, it'd almost always be the case that it's better to run 100 individual queries as opposed to a single query that would retrieve 101 (with an unwanted extra)?

WIlliam Jones 2010-06-15 16:40:30

@williamjones No, not true at all. There is very significant cost for each round trip. The point is that *if you need the data anyway*, get it in as few trips as possible. *If you don't know whether you need the data or not*, then wait until you *actually do* need it.

BradC 2010-06-15 16:48:45

Regarding your contrived example, of course it'd be better to get 101 rows in a single query than do 100 individual queries. In the real world, though, I've found its much more common that developers will make a single query to retrieve 100,000 rows, then throw away 99,900. Or retrieve all 100 columns of a table, then only use 5. Pick your favorite contrived example :)

BradC 2010-06-15 16:53:52

ansaurus

tags:

views:

answers:

When optimizing database queries, what exactly is the relationship between number of queries and size of queries?

related questions