tags:

views:

101

answers:

4

My question is similar to this http://stackoverflow.com/questions/879893/sql-order-of-operations but with a little twist, so I think it's fair to ask.

I'm using Teradata. And I have 2 tables: table1, table2.

table1 has only an id column.
table2 has the following columns: id, val

I might be wrong but I think these two statements give the same results.

Statement 1.

SELECT table1.id, table2.val
FROM table1
INNER  JOIN table2
ON table1.id = table2.id
WHERE table2.val<100

Statement 2.

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT *
    FROM table2
    WHERE val<100
)  table3
ON table1.id=table3.id

My questions is, will the query optimizer be smart enough to
- execute the WHERE clause first then JOIN later in Statement 1
- know that table 3 isn't actually needed in Statement 2

I'm pretty new to SQL, so please educate me if I'm misunderstanding anything.

A: 

Unless I'm missing something, Why do you even need Table1??

Just query Table2

Select id, val  
From table2  
WHERE val<100 

or are you using the rows in table1 as a filter? i.e., Does table1 only copntain a subset of the Ids in Table2??

If so, then this will work as well ...

 Select id, val  
 From table2  
 Where val<100 
   And id In (Select id 
              From table1)

But to answer your question, Yes the query optimizer should be intelligent enough to figure out the best order in which to execute the steps necessary to translate your logical instructions into a physical result. It uses the strored statistics that the database maintains on each table to determine what to do (what type of join logic to use for example), as wekll as what order to perform the operations in in order to minimize Disk IOs and processing costs.

Charles Bretana
Well he is doing an inner join so he is limiting his result set to where the values exist in both tables.
JNK
+3  A: 

this would depend on many many things (table size, index, key distribution, etc), you should just check the execution plan:

you don't say which database, but here are some ways:
MySql EXPLAIN
SQL Server SET SHOWPLAN_ALL (Transact-SQL)
Oracle EXPLAIN PLAN

what is explain in teradata?
Teradata Capture and compare plans faster with Visual Explain and XML plan logging

KM
I'm pretty sure that Russell did say which DB. Its this one http://en.wikipedia.org/wiki/Teradata
Conrad Frix
@Conrad Frix , thanks read right past that, I've added links for that
KM
A: 

Q1. execute the WHERE clause first then JOIN later in Statement 1

The thing is, if you switch the order of inner join, i.e. table2 INNER JOIN table1, then I guess WHERE clause can be processed before JOIN operation, during the preparation phase. However, I guess even if you don't change the original query, the optimizer should be able to switch their order, if it thinks the join operation will be too expensive with fetching the whole row, so it will apply WHERE first. Just my guess.

Q2. know that table 3 isn't actually needed in Statement 2

Teradata will interpret your second query in such way that the derived table is necessary, so it will keep processing table 3 involved operation.

Wei
+1  A: 

Depending on the availability of statistics and indexes for the tables in question the query rewrite mechanism in the optimizer will may or may not opt to scan Table2 for records where val < 100 before scanning Table1.

In certain situations, based on data demographics, joins, indexing and statistics you may find that the optimizer is not eliminating records in the query plan when you feel that it should. Even if you have a derived table such as the one in your example. You can force the optimizer to process a derived table by simply placing a GROUP BY in your derived table. The optimizer is then obligated to resolve the GROUP BY aggregate before it can consider resolving the join between the two tables in your example.

SELECT table1.id, table3.val
FROM table1
INNER JOIN (
    SELECT table2.id, tabl2.val
    FROM table2
    WHERE val<100
    GROUP BY 1,2
)  table3
ON table1.id=table3.id

This is not to say that your standard approach should be to run with this through out your code. This is typically one of my last resorts when I have a query plan that simply doesn't eliminate extraneous records earlier enough in the plan and results in too much data being scanned and carried around through the various SPOOL files. This is simply a technique you can put in your toolkit to when you encounter such a situation.

The query rewrite mechanism is continually being updated from one release to the next and the details about how it works can be found in the SQL Transaction Processing Manual for Teradata 13.0.

RobPaller