tags:

views:

230

answers:

9

I have a table that contains maybe 10k to 100k rows and I need varying sets of up to 1 or 2 thousand rows, but often enough a lot less. I want these queries to be as fast as possible and I would like to know which approach is generally smarter:

  1. Always query for exactly the rows I need with a WHERE clause that's different all the time.
  2. Load the whole table into a cache in memory inside my app and search there, syncing the cache regularly
  3. Always query the whole table (without WHERE clause), let the SQL server handle the cache (it's always the same query so it can cache the result) and filter the output as needed

I'd like to be agnostic of a specific DB engine for now.

+2  A: 

In my experience it is best to query for what you want and let the database figure out the best way to do it. You can examine the query plan to see if you have any bottlenecks that could be helped by indexes as well.

Otávio Décio
+7  A: 

with 10K to 100K rows, number 1 is the clear winner to me. If it was <1K I might say keep it cached in the application, but with this many rows, let the DB do what it was designed to do. With the proper indexes, number 1 would be the best bet.

If you were pulling the same set of data over and over each time then caching the results might be a better bet too, but when you are going to have a different where all the time, it would be best to let the DB take care of it.

Like I said though, just make sure you index well on all the appropriate fields.

Ryan Guill
+3  A: 

I firmly believe option 1 should be preferred in an initial situation. When you encounter performance problems, you can look on how you could optimize it using caching. (Pre optimization is the root of all evil, Dijkstra once said).

Also, remember that if you would choose option 3, you'll be sending the complete table-contents over the network as well. This also has an impact on performance .

Frederik Gheysels
It was actually Knuth that said that. :-)
Jason Baker
+4  A: 

Seems to me that a system that was designed for rapid searching, slicing, and dicing of information is going to be a lot faster at it than the average developers' code. On the other hand, some factors that you don't mention include the location or potential location of the database server in relation to the application - returning large data sets over slower networks would certainly tip the scales in favor of the "grab it all and search locally" option. I think that, in the 'general' case, I'd recommend querying for just what you want, but that in special circumstances, other options may be better.

Harper Shelby
A: 

if you do this:

SELECT * FROM users;

mysql should perform two queries: one to know fields in the table and another to bring back the data you asked for.

doing

SELECT id, email, password FROM users;

mysql only reach the data since fields are explicit.

about limits: always ss best query the quantity of rows you will need, no more no less. more data means more time to drive it

Gabriel Sosa
Untrue. Any SQL interpreter will have to fetch the attribute information in order to tell if your list is correct. Otherwise, how would it know that SELECT asdkfljdsalkfjsdlakf isn't a valid attribute?
Matt Rogish
+1  A: 

The only reason to use anything other than option 1 is if the WHERE clause itself is huge (i.e. if your WHERE clause identifies each row individually, e.g. WHERE id = 3 or id = 4 or id = 32 or ...).

j_random_hacker
+2  A: 

First of all, let us dismiss #2. Searching tables is data servers reason for existence, and they will almost certainly do a better job of it than any ad hoc search you cook up.

For #3, you just say 'filter the output as needed" without saying where that filter is been done. If it's in the application code as in #2, than, as with #2, than you have the same problem as #2.

Databases were created specifically to handle this exact problem. They are very good at it. Let them do it.

James Curran
A: 

Is anything else changing your data? The point about letting the SQL engine optimally slice and dice is a good one. But it would be surprising if you were working with a database and do not have the possibility of "someone else" changing the data. If changes can be made elsewhere, you certainly want to re-query frequently.

sfuqua
A: 

Trust that the SQL server will do a better job of both caching and filtering than you can afford to do yourself (unless performance testing shows otherwise.)

Note that I said "afford to do" not just "do". You may very well be able to do it better but you are being paid (presumably) to provide functionality not caching.

Ask yourself this... Is spending time writing cache management code helping you fulfil your requirements document?

Chris Nava