views:

726

answers:

3

NOTE: the original question is moot but scan to the bottom for something relevant.

I have a query I want to optimize that looks something like this:

select cols from tbl where col = "some run time value" limit 1;

I want to know what keys are being used but whatever I pass to explain, it is able to optimize the where clause to nothing ("Impossible WHERE noticed...") because I fed it a constant.

  • Is there a way to tell mysql to not do constant optimizations in explain?
  • Am I missing something?
  • Is there a better way to get the info I need?

Edit: EXPLAIN seems to be giving me the query plan that will result from constant values. As the query is part of a stored procedure (and IIRC query plans in spocs are generated before they are called) this does me no good because the value are not constant. What I want is to find out what query plan the optimizer will generate when it doesn't known what the actual value will be.

Am I missing soemthing?

Edit2: Asking around elsewhere, it seems that MySQL always regenerates query plans unless you go out of your way to make it re-use them. Even in stored procedures. From this it would seem that my question is moot.

However that doesn't make what I really wanted to know moot: How do you optimize a query that contains values that are constant within any specific query but where I, the programmer, don't known in advance what value will be used? -- For example say my client side code is generating a query with a number in it's where clause. Some times the number will result in an impossible where clause other times it won't. How can I use explain to examine how well optimized the query is?

The best approach I'm seeing right off the bat would be to run EXPLAIN on it for the full matrix of exist/non-exist cases. Really that isn't a very good solution as it would be both hard and error prone to do by hand.

+5  A: 

You are getting "Impossible WHERE noticed" because the value you specified is not in the column, not just because it is a constant. You could either 1) use a value that exists in the column or 2) just say col = col:

explain select cols from tbl where col = col;
Robert Gamble
Nether of those solve my problem. I want to know what the query plan is where the optimizer *doesn't know* if the value is in the column.
BCS
Here's how it works: The optimizer determines if the select is possible by reading the const and system tables, if it is then you get the query plan. My solution will give you the query plan because the optimizer won't stop early because it thinks the query is impossible.
Robert Gamble
Yes, it will give you a query plan and avoid the issue with the where clause never passing, but it will then assume that the where clause will *always* pass, and that also isn't the case either. What I want is to know how things will perform for both cases. (see edit2)
BCS
A: 

How do you optimize a query with values that are constant only to the query but where I, the programmer, don't known in advance what value will be used?

By using indexes on the specific columns (or even on combination of columns if you always query the given columns together). If you have indexes, the query planner will potentially use them.

Regarding "impossible" values: the query planner can conclude that a given value is not in the table from several sources:

  • if there is an index on the particular column, it can observe that the particular value is large or smaller than any value in the index (min/max values take constant time to extract from indexes)
  • if you are passing in the wrong type (if you are asking for a numeric column to be equal with a text)

PS. In general, creation of the query plan is not expensive and it is better to re-create than to re-use them, since the conditions might have changed since the query plan was generated and a better query plan might exists.

Cd-MaN
It seems you are answering the same question as everyone else has, but not the one I am asking. I'll try, yet again, editing the question.
BCS
+3  A: 

For example say my client side code is generating a query with a number in it's where clause.

Some times the number will result in an impossible where clause other times it won't.

How can I use explain to examine how well optimized the query is?

MySQL builds different query plans for different values of bound parameters.

In this article you can read the list of when does the MySQL optimizer does what:

    Action                                      When

    Query parse                                 PREPARE
    Negation elimination                        PREPARE
    Subquery re-writes                          PREPARE

    Nested JOIN simplification                  First EXECUTE
    OUTER->INNER JOIN conversions               First EXECUTE

    Partition pruning                           Every EXECUTE
    COUNT/MIN/MAX elimination                   Every EXECUTE
    Constant subexpression removal              Every EXECUTE
    Equality propagation                        Every EXECUTE
    Constant table detection                    Every EXECUTE
    ref access analysis                         Every EXECUTE
    range/index_merge analysis and optimization Every EXECUTE
    Join optimization                           Every EXECUTE

There is one more thing missing in this list.

MySQL can rebuild a query plan on every JOIN iteration: a such called range checking for each record.

If you have a composite index on a table:

CREATE INDEX ix_table2_col1_col2 ON table2 (col1, col2)

and a query like this:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col1 = t1.value1
        AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound

, MySQL will NOT use an index RANGE access from (t1.value1, t1.value2_lowerbound) to (t1.value1, t1.value2_upperbound). Instead, it will use an index REF access on (t1.value) and just filter out the wrong values.

But if you rewrite the query like this:

SELECT  *
FROM    table1 t1
JOIN    table2 t2
ON      t2.col1 <= t1.value1
        AND t2.col1 >= t2.value1
        AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound

, then MySQL will recheck index RANGE access for each record from table1, and decide whether to use RANGE access on the fly.

You can read about it in these articles in my blog:

All these things employ RANGE CHECKING FOR EACH RECORD

Returning to your question: there is no way to tell which plan will MySQL use for every given constant, since there is no plan before the constant is given.

Unfortunately, there is no way to force MySQL to use one query plan for every value of a bound parameter.

You can control the JOIN order and INDEX'es being chosen by using STRAIGHT_JOIN and FORCE INDEX clauses, but they will not force a certain access path on an index or forbid the IMPOSSIBLE WHERE.

On the other hand, for all JOIN's, MySQL employs only NESTED LOOPS. That means that if you build right JOIN order or choose right indexes, MySQL will probably benefit from all IMPOSSIBLE WHERE's.

Quassnoi
Nice commentary, but I think you are still skipping over my point: What I am/was looking for is a way to ask the query optimizer for query plans (plural) for different possible and impossible where clauses without having to back out values that trigger them. -- I could see a tool that just runs the optimizer and "forks" for every question asked and spits out every query plan that is generated (clearly some user pruning would be needed) so I can see what different plans I might end up with.
BCS
You mean, all possible plans for all possible values? There are 2^32 of integers alone, to say nothing of VARCHAR's
Quassnoi
WHERE 1 < 0; WHERE 1 < 2; WHERE 2 < 0; WHERE 2 < 1; WHERE 2 < 2 etc.: all these are impossible WHERE's from MySQL's point of view.
Quassnoi
Also direct searching for an absent PRIMARY KEY is also an impossible WHERE caught on the parsing stage. SELECT * FROM table1 WHERE id = 1 will result in an IMPOSSIBLE WHERE if ID is a PRIMARY KEY and there is no record with ID = 1
Quassnoi
Greate explanation!
Yosef