views:

327

answers:

3

I'm building some profile information for a home grown app. I'd like the debug page to show the query sent along with how many rows were examined without assuming that slow_log is turned on, let alone parsing it.

Back in 2006, what I wanted was not possible. Is that still true today?

I see Peter Zaitsev has a technique where you start with:

FLUSH STATUS;

Then the query.

Then: SHOW STATUS LIKE "Handler%";

quote:

Handler_read_next=42250 means 42250 rows were analyzed during this scan

which sounds like if MySQL is only examining indexes, it should give you the number. But are there a set of status vars you can poll, add up and find out how many rows examined? Any other ideas?

+1  A: 

From documentation:

Handler_read_rnd

The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that don't use keys properly.

Handler_read_rnd_next

The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.

read_rnd* means reading actual table rows with a fullscan.

Note that it will show nothing if there is a index scan combined with a row lookup, it still counts as key read.

For the schema like this:

CREATE TABLE mytable (id INT NOT NULL PRIMARY KEY, data VARCHAR(50) NOT NULL)

INSERT
INTO    mytable
VALUES  …

SELECT  id
FROM    mytable
WHERE   id BETWEEN 100 AND 200

SELECT  *
FROM    mytable
WHERE   id BETWEEN 100 AND 200

, the latter two queries will both return 1 in read_key, 101 in read_next and 0 in both read_rnd and read_rnd_next, despite the fact that actual row lookups occur in the second query.

Quassnoi
+1  A: 

Prepend the query with EXPLAIN. In MySQL that will show the query's execution path, which tables were examined as well as the number of rows examined for each table.

Here's the documentation.

Thomas
EXPLAIN is helpful, but it does not tell you how many rows were actually examined.
joedevon
Sorry, but I have to disagree about EXPLAIN. I've found it to be particularly helpful when trying to examine execution paths of queries that involve joins. It will show the number of rows examined per table used in the execution path. I've used it many times to troubleshoot join conditions and index schemes.
Thomas
Find or create a table with 1 million rows. SELECT DISTINCT(field) LIMIT 10...but make sure field is relatively but not completely unique. See how many rows it claims to have looked at. Since DISTINCT is internally like GROUP BY, EXPLAIN believes it needs to examine all rows before returning UNIQUE ones and it will tell you it looked at a million some rows. But it will only look at about 10 rows until it can satisfy the LIMIT. If you still aren't sure that it didn't analyze the million rows, remove the LIMIT and see how long the hard drive thrashes before it even begins to spit out results.
joedevon
You make an excellent point. I stand corrected.
Thomas
+1  A: 

It's slightly better than it was in 2006. You can issue SHOW SESSION STATUS before and after and then look at each of the Handler_read_* counts in order to be able to tell the number of rows examined.

There's really no other way.. While the server protocol has a flag to say if a table scan occurred, it doesn't expose rows_examined. Even tools like MySQL's Query Analyzer have to work by running SHOW SESSION STATUS before/after (although I think it only runs SHOW SESSION STATUS after, since it remembers the previous values).

I know it's not related to your original question, but there are other expensive components to queries besides rows_examined. If you choose to do this via the slow log, you should check out this patch:

http://www.percona.com/docs/wiki/patches%3Amicroslow%5Finnodb#changes%5Fto%5Fthe%5Flog%5Fformat

I can recommend looking for "Disk_tmp_table: Yes" and "Disk_filesort: Yes".

Morgan

Morgan Tocker
Thanks Morgan. So is there an algorithm guaranteed to get the absolute right count from the Handler_read_* columns no matter what "magical" subqueries are in your statement (since obviously you can't just add them up)? I'm surprised with all the great MySQL blogs out there that no one has written up the correct way to do this.
joedevon
I'm not 100% sure what you mean by "guaranteed to get the absolute right count from the Handler_read_* columns". It's true that Handler_read_* may be incremented multiple times because of missing subquery optimizations, but in my mind you still want to see this count towards the final number, since it's not free (although it probably does have a higher chance of a cache hit).What you might also consider looking at something like the statistics that come out of SHOW STATUS LIKE 'innodb_buffer_pool_read%'; (or SHOW STATUS LIKE 'key_read%';) which could show you cache misses.
Morgan Tocker