views:

447

answers:

8

Compare these 2 queries. Is it faster to put the filter on the join criteria or in the were clause. I have always felt that it is faster on the join criteria because it reduces the result set at the soonest possible moment, but I don't know for sure.

I'm going to build some tests to see, but I also wanted to get opinions on which would is clearer to read as well.

Query 1

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
INNER JOIN  TableB b
        ON  x.TableBID = b.ID
WHERE       a.ID = 1            /* <-- Filter here? */

Query 2

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
        AND a.ID = 1            /* <-- Or filter here? */
INNER JOIN  TableB b
        ON  x.TableBID = b.ID

EDIT

I ran some tests and the results show that it is actually very close, but the WHERE clause is actually slightly faster! =)

I absolutely agree that it makes more sense to apply the filter on the WHERE clause, I was just curious as to the performance implications.

ELAPSED TIME WHERE CRITERIA: 143016 ms
ELAPSED TIME JOIN CRITERIA: 143256 ms

TEST

SET NOCOUNT ON;

DECLARE @num    INT,
        @iter   INT

SELECT  @num    = 1000, -- Number of records in TableA and TableB, the cross table is populated with a CROSS JOIN from A to B
        @iter   = 1000  -- Number of select iterations to perform

DECLARE @a TABLE (
        id INT
)

DECLARE @b TABLE (
        id INT
)

DECLARE @x TABLE (
        aid INT,
        bid INT
)

DECLARE @num_curr INT
SELECT  @num_curr = 1

WHILE (@num_curr <= @num)
BEGIN
    INSERT @a (id) SELECT @num_curr
    INSERT @b (id) SELECT @num_curr

    SELECT @num_curr = @num_curr + 1
END

INSERT      @x (aid, bid)
SELECT      a.id,
            b.id
FROM        @a a
CROSS JOIN  @b b

/*
    TEST
*/
DECLARE @begin_where    DATETIME,
        @end_where      DATETIME,
        @count_where    INT,
        @begin_join     DATETIME,
        @end_join       DATETIME,
        @count_join     INT,
        @curr           INT,
        @aid            INT

DECLARE @temp TABLE (
        curr    INT,
        aid     INT,
        bid     INT
)

DELETE FROM @temp

SELECT  @curr   = 0,
        @aid    = 50

SELECT  @begin_where = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
    INNER JOIN  @b b
            ON  x.bid = b.id
    WHERE       a.id = @aid

    SELECT @curr = @curr + 1
END
SELECT  @end_where = CURRENT_TIMESTAMP

SELECT  @count_where = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @curr = 0
SELECT  @begin_join = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
            AND a.id = @aid
    INNER JOIN  @b b
            ON  x.bid = b.id

    SELECT @curr = @curr + 1
END
SELECT  @end_join = CURRENT_TIMESTAMP

SELECT  @count_join = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @count_where AS count_where,
        @count_join AS count_join,
        DATEDIFF(millisecond, @begin_where, @end_where) AS elapsed_where,
        DATEDIFF(millisecond, @begin_join, @end_join) AS elapsed_join
A: 

I guess that the first, because it makes a more specific filter over the data. But you should see the execution plan, as with any optimization, because it can be very different deppending on size of data, server hardware, etc.

eKek0
A: 

Is it faster? Try it and see.

Which is easier to read? The first to me looks more "correct", as the moved condition is nothing really to do with the join.

David M
Oh, sorry man!!!
Quassnoi
+1  A: 

With any query optimizer worh a cent.... they are identical.

TomTom
I'm quite sure that, with any real workload, they are not identical. If you almost don't have data, then the question is worthless.
eKek0
Check it out under real workload. Basically - if they generate the same execution plan, they... are identical in performance.At least for normal / simple cases (i.e. not the one joining 14 tables) I am quite sure they are identical ;)
TomTom
+4  A: 

Performance-wise, they are the same (and produce the same plans)

Logically, you should make the operation that still has sense if you replace INNER JOIN with a LEFT JOIN.

In your very case this will look like this:

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
        AND a.ID = 1
LEFT JOIN
        TableB b
ON      x.TableBID = b.ID

or this:

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
LEFT JOIN
        TableB b
ON      b.id = x.TableBID
WHERE   a.id = 1

The former query will not return any actual matches for a.id other than 1, so the latter syntax (with WHERE) is logically more consistent.

Quassnoi
+2  A: 

As far as the two methods go.

  • JOIN/ON is for joining tables
  • WHERE is for filtering results

Whilst you can use them differently it always seems like a smell to me.

Deal with performance when it is a problem. Then you can look into such "optimisations".

Robin Day
A: 

It is really unlikely that the placement of this join will be the deciding factor for performance. I am not intimately familiar with the execution planning for tsql, but it's likely that they will be optimized automatically to similar plans.

Joseph Mastey
A: 

Rule #0: Run some benchmarks and see! The only way to really tell which will be faster is to try it. These types of benchmarks are very easy to perform using the SQL profiler.

Also, examine the execution plan for the query written with a JOIN and with a WHERE clause to see what differences stand out.

Finally, as others have said, these two should be treated identically by any decent optimizer, including the one built into SQL Server.

David Lively
But only for inner joins. The results set will be very different for out joins.
HLGEM
Of course. Fortunately, the provided example uses inner joins.
David Lively
+2  A: 

For inner joins it doesn't matter where you put your criteria. The SQL compiler will transform both into an execution plan in which the filtering occurs below the join (ie. as if the filter expressions appears is in the join condition).

Outer joins are a different matter, since the place of the filter changes the semantics of the query.

Remus Rusanu