views:

32

answers:

2

To optimize application speed, everyone always advises to minimize the number of queries an application makes to the database, consolidating them into fewer queries that retrieve more wherever possible.

However, this also always comes with the caution that data transferred is still data transferred, and just because you are making fewer queries doesn't make the data transferred free.

I'm in a situation where I can over-include on the query in order to cut down the number of queries, and simply remove the unwanted data in the application code.

Is there any type of a rule of thumb on how much of a cost there is to each query, to know when to optimize number of queries versus size of queries? I've tried to Google for objective performance analysis data, but surprisingly haven't been able to find anything like that.

Clearly this relationship will change for factors such as when the database grows in size, making this somewhat individualized, but surely this is not so individualized that a broad sense of the landscape can't be drawn out?

I'm looking for general answers, but for what it's worth, I'm running an application on Heroku.com, which means Ruby on Rails with a Postgres database.

+1  A: 

If you're looking for a rule of thumb: whenever possible, filter, sort and page in the database query. The database is optimized for these kinds of operations (set operations).

Application code is best reserved for true business logic (and display logic, etc.).

Phil Sandler
+2  A: 

I'm firmly in the "get only what you need when you need it" camp.

Retrieving extra rows that you may or may not need (lets say, retrieving full order details when loading an order summary screen, just in case the user drills down) just results in a much more complex query, probably joining tables that won't be used most of the time.

As a DBA, the hardest queries to optimize are the ones that join a large number of tables together.

Retrieving extra columns isn't quite as bad, but sometimes the server can retrieve just a few key columns directly from a "covering index" rather than having to retrieve all columns from a base table.

I think the key to the advice you've heard is to not make unnecessary round trips when you can get it all at once, instead of what it sounds like you are saying "get extra data just in case you need it".

Developers are so used to "modularizing" everything, its not at all unusual to actually end up with a final web page that makes hundreds or even multiple thousands of calls to the database to load the web page just once. We have a commercial product in-house that we've measured makes over 50,000 calls to the database for a single action.

For an example (somewhat contrived), lets say you have an "Order Summary" page that includes an "order total" field, which is the sum of all items in the "Order Detail" table. The wrong approach is:

  1. Retrieve the list of orders from the Order Header table
  2. Programatically loop through the orders
  3. For each order, execute a query to retrieve all order detail records
  4. Programatically add up the order items to get the total, which is displayed in the grid

Sounds crazy, right? This is more common than you think, especially when you build data-bound logic into individual web-components. Much more efficient:

  1. Make a single call to the database, which a query something like:

    SELECT oh.OrderID, oh.OrderDate, SUM(od.LineTotal) as OrderTotal
    FROM OrderHeader oh
    INNER JOIN OrderDetail od on oh.OrderID = od.OrderID
    
  2. Display the results in the grid.

BradC
So if I'm hearing you right, in terms of speed, the cost of transferring data generally far dominates the cost of additional queries? So in a contrived example, let's say I have a batch of 100 decently sized objects I want to retrieve, it'd almost always be the case that it's better to run 100 individual queries as opposed to a single query that would retrieve 101 (with an unwanted extra)?
WIlliam Jones
@williamjones No, not true at all. There is very significant cost for each round trip. The point is that *if you need the data anyway*, get it in as few trips as possible. *If you don't know whether you need the data or not*, then wait until you *actually do* need it.
BradC
Regarding your contrived example, of course it'd be better to get 101 rows in a single query than do 100 individual queries. In the real world, though, I've found its much more common that developers will make a single query to retrieve 100,000 rows, then throw away 99,900. Or retrieve all 100 columns of a table, then only use 5. Pick your favorite contrived example :)
BradC