views:

107

answers:

6

I am getting different result set for these two queries and second result set seems to be correct. What is the difference in these queries.

What type of inner join query second is?

1)

FROM TABLE1 t1
 INNER JOIN TABLE2 t2 ON t1.Id = t2.Id
WHERE
 t1.StatusId = 12

2)

FROM TABLE1 t1
 INNER JOIN TABLE2 t2 ON t1.Id = t2.Id
 AND t1.StatusId = 12
+5  A: 

These queries should be returning identical results. Please post the complete queries.

Below is a sample demonstrating the same output from the two variations:

Sample:

CREATE TABLE [dbo].[t1](
    [ID] [int] NULL,
    [StatusID] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[t2](
    [ID] [int] NULL
) ON [PRIMARY]
INSERT INTO t1 (ID, StatusID) VALUES (1, 10)
INSERT INTO t1 (ID, StatusID) VALUES (2, 11)
INSERT INTO t1 (ID, StatusID) VALUES (3, 12)
INSERT INTO t1 (ID, StatusID) VALUES (4, 12)
INSERT INTO t2 (ID) VALUES (1)
INSERT INTO t2 (ID) VALUES (3)
INSERT INTO t2 (ID) VALUES (5)

SELECT t1.ID, t1.StatusID
FROM t1 
INNER JOIN t2 ON t1.Id = t2.Id 
WHERE t1.StatusId = 12 

SELECT t1.ID, t1.StatusID
FROM t1 
INNER JOIN t2 ON t1.Id = t2.Id 
    AND t1.StatusId = 12 

Output:

ID          StatusID
----------- -----------
3           12

(1 row(s) affected)

ID          StatusID
----------- -----------
3           12

(1 row(s) affected)
RedFilter
they aren't identical...there's a subtle difference in the 2
Leslie
@Leslie: That is incorrect. For an `INNER JOIN`, the results are identical whether you filter in the `WHERE` clause or `ON` clause.
RedFilter
Agreed. The difference matters for outer joins, but should not matter for inner.
Nikki9696
Why the downvote? Vote with facts, not assumptions :)
RedFilter
I updated my response with a sample case.
RedFilter
A: 

I believe that it is like this:

The first query gets all the results from the joined tables and then filters on the WHERE clause.

The second query gets the results from table 1 where the StatusID = 12 and then performs the join on the result set.

It's a subtle difference.

Edit: Here's a link to an article verifying the difference: http://www.sqlteam.com/article/additional-criteria-in-the-join-clause

Leslie
how does this subtle difference cause a difference in output?
Jimmy
Yes, but the question is why don't they return the same results. They appear to the naked eye to request the same information and I would guess from looking at them that they would generate the same execution plan.
Larry Lustig
-1: Believing is not enough in this case - please provide a sample use case demonstrating the difference in output.
RedFilter
-1 as well. I've tested multiple tables, and the execution plans are identical regardless of using `WHERE t1.StatusId = 12 ` or adding `AND t1.StatusId = 12` as part of the join. SQL should be smart enough to figure out the execution plans based on the query.
LittleBobbyTables
From the article you linked to: "Note that there is no difference in the result sets if we use an inner join instead of an outer join, because the inner join removes the non-matching records regardless."
Jon Schoning
+1  A: 

SQL is a declarative language, and it's up to the DBMS to implement your query. The SQL Server query optimiser does not care whether you inner-join two tables using a JOIN expression, a WHERE expression, an IN expression or an EXISTS expression, internally they are all treated the same. SQL Server has 3 different internal join algorithms, which are selected only on the basis of row count and data distribution, not the exact syntax used to declare them. If you compare the execution plans of the two queries in your example you should find that they are identical, and will therefore produce the same resultset.

Christian Hayter
A: 

The difference under the hood is that the WHERE clause is accessed after all the joins.

This means that maybe in this case there is no difference but that would not always be true when there are multiple joins. They could be factored out of the virtual resultsets (depending on the circumstances) before the WHERE is processed.

Kindly see this link: http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

Mike Cheel
+2  A: 

Looking at the edit history of the question it appears that your queries are along the following lines.

Query One

  INNER JOIN dbo.T2
        ON ...
    LEFT OUTER  JOIN dbo.T3
        ON  ...
    WHERE 
     T3.col = somevalue AND ...

Query Two

  INNER JOIN dbo.T2
        ON ...
    LEFT OUTER  JOIN dbo.T3
        ON  ... AND T3.col = somevalue
   WHERE 
        ... 

The difference between them is that Query One effectively converts the LEFT Join to an INNER Join.

For a left outer join conceptually the following happens.

  1. T2 is joined onto T3 and the predicate in the join clause is evaluated.
  2. Any non matching rows from T2 are added back in with NULL values for the T3 column values.
  3. The WHERE clause is applied.

None of these rows added back in in step 2 will meet the T3.col = somevalue predicate in step 3 as we know that the value of this column for all these rows is NULL.

Martin Smith
+1  A: 

It really makes no difference when you do this in the INNER JOIN.

However, when you use LEFT or RIGHT JOIN, it does make a difference whether you put the additional filter into the JOIN or into the WHERE clause.

When you put the filter into the WHERE clause, SQL Server does the join first, and then completely filters out the rows where the filter does not fit.
--> this will reduce the number of rows which are returned

When you put the filter into the JOIN, SQL Server does the filtering during the join, but only on the table where you put the filter.
You still get all the rows from the other tables, but only those have the data from the filtered table where the filter fits.
--> this will not reduce the number of rows, but the columns with data from the filteres table will be empty in more rows

It's difficult to explain...to make it more clear, here's an example:

Take the sample data from RedFilter's answer:

CREATE TABLE [dbo].[t1](
    [ID] [int] NULL,
    [StatusID] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[t2](
    [ID] [int] NULL
) ON [PRIMARY]
INSERT INTO t1 (ID, StatusID) VALUES (1, 10)
INSERT INTO t1 (ID, StatusID) VALUES (2, 11)
INSERT INTO t1 (ID, StatusID) VALUES (3, 12)
INSERT INTO t1 (ID, StatusID) VALUES (4, 12)
INSERT INTO t2 (ID) VALUES (1)
INSERT INTO t2 (ID) VALUES (3)
INSERT INTO t2 (ID) VALUES (5)

...and run the following queries on it:

/* this returns four rows, but only two will have data 
from the second table in the second column */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 

/* this returns only one row: the one where t2.ID = 1 */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 
WHERE t2.ID = 1 

/* this returns four rows as in the first query, but only one 
row will have data in the second column: the one where t2.ID = 1 */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 
AND t2.ID = 1 

Note the different results as indicated in the comments.

haarrrgh