ansaurus

Question

Overhead for MySQL SELECTS - Better to Use One, or Many In Sequence

Answer 1

+5 A:

It's almost always faster to do one big batch SELECT and parse the results in your application code than doing a massive amount of SELECTs for one row. I would recommend that you implement both and profile them, though. Always strive to minimize the number of assumptions you have to make.

Emil H 2009-12-05 17:05:28

Thanks! I suspected as much. Could you give a sense of *why* it's better though, for example because of the connection overhead? I'd like a better understanding of how exactly one way is better.

anschauung 2009-12-05 20:34:22

CPU and Memory is always less costly than any form of IO, including network IO. Think about it like this: It's definitely less expensive to send a single order for 500 books and receive them in one huge package over the mail, than to send 500 orders and receive all the books in separate packaging. :)

Emil H 2009-12-06 02:46:12

Answer 2

+1 A:

It seems like you know what the 500 id values are, so why not do something like this:

// Assuming you have already validated that this array contains only integers
// so there is not risk of SQl injection

$ids = join(',' $arrayOfIds);

$sql = "SELECT `foo`, `bar` FROM `users` WHERE `id` IN ($ids)";

Doug Neiner 2009-12-05 17:13:10

I don't know the IDs in this case, but that's a neat technique. I'll have to remember it.

anschauung 2009-12-05 17:16:00

If you don't have ids, how can you run `SELECT foo, bar, FROM users WHERE id = x`?

Doug Neiner 2009-12-05 17:25:49

I see what you mean. What I meant to write was that I can *get* the ids, but the most straightforward way to do so is to get them one at a time. I don't have them all in one handy batch.

anschauung 2009-12-05 17:49:58

So does that mean you are running 500x2 queries? 1 lookup for `id` and 1 SELECT with the `id` ?

Doug Neiner 2009-12-05 17:55:15

At the root of it, yes. As a practical matter, I'm plugging into a larger system that has the data arranged in a different format than I need for this particular operation. The larger application would have looked up the id anyway, so I'm not really counting that as a second query for the narrow purposes of this question

anschauung 2009-12-05 20:28:06

A list of 500 ids is not very big. I've created `where id in ( $list )` predicates having tens of thousands of items in the list.

memnoch_proxy 2009-12-06 07:37:17

Answer 3

+2 A:

I would not worry about the connection overhead of mysql queries too much, especially if you are not closing the connection between every query. Consider that if your query creates a temporary table, you've already spent more time in the query than the overhead of the query took.

I love doing a complex SQL query, personally, but I have found that the size of the tables, mysql query cache and query performance of queries that need to do range checking (even against an index) all make a difference.

I suggest this:

1) Establish the simple, correct baseline. I suspect this is the zillion-query approach. This is not wrong, and very likely helfully correct. Run it a few times and watch your query cache and application performance. The ability to keep your app maintainable is very important, especially if you work with other code maintainers. Also, if you're querying really large tables, small queries will maintain scalability.

2) Code the complex query. Compare the results for accuracy, and then the time. Then use EXPECT on the query to see what the rows scanned are. I have often found that if I have a JOIN, or a WHERE x != y, or a condition that creates a temporary table, the query performance could get pretty bad, especially if I'm in a table that's always getting updated. However, I've also found that a complex query might not be correct, and also that a complex query can more easily break as an application grows. Complex queries typically scan larger sets of rows, often creating temporary tables and invoke using where scans. The larger the table, the more expensive these get. Also, you might have team considerations where complex queries don't suit your team's strengths.

3) Share the results with your team.

Complex queries are less likely to hit the mysql query cache, and if they are large enough, don't cache them. (You want to save the mysql query cache for frequently hit queries.) Also, query where predicates that have to scan the index will not do as well. (x != y, x > y, x < y). Queries like SELECT foo, bar FROM users WHERE foo != 'g' and mumble < '360' end up doing scans. (The cost of query overhead could be negligible in that case.)

Small queries can often complete without creating temporary tables just by getting all values from the index, so long as the fields you're selecting and predicating on are indexed. So the query performance of SELECT foo, bar FROM users WHERE id = x is really great (esp if columns foo and bar are indexed like, aka alter table users add index ix_a ( foo, bar );.)

Other good ways to increase performance in your application would be to cache those small query results in the application (if appropriate), or doing batch jobs of a materialized view query. Also, consider memcached or some features found in XCache.

memnoch_proxy 2009-12-06 07:33:26

ansaurus

tags:

views:

answers:

Overhead for MySQL SELECTS - Better to Use One, or Many In Sequence

related questions