views:

28

answers:

2

Would mysql (innodb) support a higher rate (queries per second) of queries like (A) or (B)?

(A) SELECT * FROM t1 WHERE pkey BETWEEN 2000 and 2001 AND x > 300

In (A), the primary key selects a range of 800 rows. "x" is unindexed. there's one range lookup and 1 contiguous read of length 200kb.

(B) (SELECT * FROM t1 WHERE pkey BETWEEN 2000 and 2001 AND x > 300) UNION ALL (SELECT * FROM t1 WHERE pkey BETWEEN 3000 and 3001 AND x > 300)

In (B), the primary key selects a range of 200 rows. "x" is unindexed. there are two range lookups and 2 contiguous reads of length 50kb.

So to sum up, (A) has 2x the disk seeks, but 1/2th as much contiguous reading. Conversely, (B) has half the disk seeks but 2x as much contiguous reading.

In general I assume seeks are slow and contiguous reads are fast, but I assume that one extra seek is preferable to reading through 10MB of extra data. Where's the tradeoff point, roughly?

A: 

The optimiser should make the decision about how to implement the query. Just write it how you want it.

Use EXPLAIN to see roughly what it's done. It may be that it does two range scans on the index on pkey.

In general reading fewer rows is better. You can also keep more of them in the buffer pool. Two range scans is better than one in the general case.

I am assuming that your table t1 will not fit in memory entirely, in which case it's mostly academic.

MarkR
A: 

You really need to supplement your two options with the output from EXPLAIN... it doesn't just matter which is theoretically faster, it matters what optimizations MySQL is going to have available.

Let me guess for you:

a) The ranged pkey lookup is very efficient because it's on a clustered index. For everything that is in the range it reads "next, next next" to check if X matches.

b) This is a series of point lookups. But it creates a temporary table even though you think it could pipeline the results to you(!) http://www.facebook.com/note.php?note_id=276225210932

My vote is almost certainly (a).

Morgan Tocker