views:

230

answers:

2

In Oracle, which clause types get evaluated first? If I had the following ( pretend .... represent valid expressions and relation names ), what would the order of evaluation be?

SELECT   ...
FROM     .....
WHERE    ........
GROUP BY ...........
HAVING   .............
ORDER BY ................

I am under the impression that the SELECT clause is evaluated last, but other than that I'm clueless.

+2  A: 

The select list cannot always be evaluated last because the ORDER BY can use aliases that are defined in the select list so they must be executed afterwards. For example:

SELECT foo+bar foobar FROM table1 ORDER BY foobar

I'd say that in general the order of execution could be something like this:

  • FROM
  • WHERE
  • GROUP BY
  • SELECT
  • HAVING
  • ORDER BY

The GROUP BY and the WHERE clauses could be swapped without changing the result, as could the HAVING and ORDER BY.

In reality things are more complex because the database can reorder the execution according to different execution plans. As long as the result remains the same it doesn't matter in what order it is executed.

Note also that if an index is chosen for the ORDER BY clause the rows could already be in the correct order when they are read from disk. In this case the ORDER BY clause isn't really executed at all.

Mark Byers
I'm gonna say I agree with you. I'd be curious to see if anyone else has information on the contrary.
jon.johnson
`group by` is also dependent on the `select` items. also, in practice, it seems that `order by` is at the end, as evidenced by a select statement with a `rownum <` *`n`* facet in the `where` clause. in this case, the top *n* arbitrarily ordered items will be taken from the top of the result set, and then the ordering will be effected.
akf
You can't GROUP without doing WHERE first. GROUP needs an aggregate function in SELECT, and you can't feed rows that don't satisfy WHERE into that aggregate.
Mark Brackett
GROUP doesn't NEED an aggregate in the select. You can do SELECT cust_id FROM orders GROUP BY cust_id HAVING COUNT(*) > 10;You can even do it without the HAVING (making it, in effect, a DISTINCT)
Gary
I'd say the FROM/WHERE is very grey. Assuming the FROM includes join conditions (ANSI style), the optimizer isn't consstrained as to whether it applies WHERE clause predicates before or after FROM clause predicates.
Gary
Hints (which appear in the SELECT clause) would have to be evaluated early in the parsing operation, but I wouldn't expect anything else from the SELECT to be used in generating the plan. After that, the FROM and WHERE would both have to be evaluated in the same part of the plan generation to establish the most efficient plan.
symcbean
@Gary - Fair enough, but my point stands: you have to do WHERE before GROUP BY. Not including the aggregate just lets you skip the calculated column, it still doesn't allow you to include rows not matching the predicate (WHERE clause) in the intermediate result that gets GROUPed. (Obviously, an optimizer is free to make whatever substitutions and rewrites it wants while guaranteeing an equivalent result. Talking about order of execution is almost always "logical order", not necessarily "actual order". Compilers and optimizers have a lot of free reign to rearrange things that we won't notice.)
Mark Brackett
+3  A: 

That's what execution plans are for. But, generally, there's only 1 way to do it. I'll ignore optimizations for the moment:

  • FROM to get the table involved
  • Start scanning the table in FROM, keeping those that pass WHERE clause
  • SELECT unaggregated columns
  • Calculate aggregated columns with GROUP BY
  • Keep those grouped results that pass HAVING clause
  • order results with ORDER BY

Optimizations could cause some "peeking" to make better decisions (eg., it'd be a good idea to check the WHERE clause before scanning the table - an index may be available).

I believe most RDBMS solve this with a pre-pass through an optimizer which will basically rewrite the query to take advantage of indexes, remove redundant expressions, etc. This optimized query is then used to actually build the execution plan. There's also parallelism that could change the specifics - but the basics are the same.

Mark Brackett