ansaurus

Question

Comparing SQL Table to itself (Self-join)

Answer 1

+2 A:

Dear God - never JOIN on an inequality. "Never say never," I know, but... never do that. In fact, it looks to me like you've inverted the JOIN and WHERE conditions.

SELECT t1.id
FROM Test t1
INNER JOIN Test t2
ON ((t1.test1 = t2.test2) OR (t1.test2 = t2.test1))
WHERE t1.id <> t2.id

should work fine, no?

Aaronaught 2009-12-11 19:47:07

Hello,From some tests this still seems slower than using the union :(What is the reason for never joining on inequality? Wouldn't the where statement be the same? (Although potentially your join returns less rows than the other, potentially speeding up the query. Is this the reason?)

Zenox 2009-12-11 19:55:00

In my test, the UNION version takes over 3 times as long. How are you testing exactly? The reason not to JOIN on an inequality is that the optimizer has to read every single row satisfying that condition (i.e. almost all of them) and filter afterward; this version can make use of an index on column test1 or test2 or both. Unless the optimizer is somehow rewriting your query, you should see a massive performance improvement if you use this version with the proper indexes.

Aaronaught 2009-12-11 20:15:35

Actually, now that I think about it, since your schema appears to have no useful indexes, the query I posted will perform the same as the inequality-join query; no matter what you do, you'll end up with two full clustered-index scans, which is horrible. You need covering indexes on (test1, test2) and (test2, test1) to get any better performance.

Aaronaught 2009-12-11 20:20:37

Answer 2

+2 A:

You only get back both id's if you select them:

SELECT [LEFT].[ID], [RIGHT].[ID] 
FROM [TEST] AS [LEFT] 
   INNER JOIN [TEST] AS [RIGHT] 
   ON [LEFT].[ID] != [RIGHT].[ID] 
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]

The reason that only get one ROW is that only one row (namely row #2) has a TEST1 that is equal to another row's TEST2.

klausbyskov 2009-12-11 19:48:09

+1 because you explained *why* the original syntax wasn't working. And because your answer works. "This answer is useful"

Ian Boyd 2009-12-11 20:01:19

Answer 3

+1 A:

I looks like you're working very quickly toward a Cartiesian Join. Normally if you're looking to return duplicates, you need to run something like:

SELECT [LEFT].*
FROM [TEST]  AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON [LEFT].[test1] = [RIGHT].[test1]
        AND [LEFT].[test2] = [RIGHT].[test2]
        AND [LEFT].[id] <> [RIGHT].[id]

If you need to mix the columns, then mix the needed conditions, but do something like:

SELECT [LEFT].*
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
    ON (
        [LEFT].[test1] = [RIGHT].[test2]
            OR [LEFT].[test2] = [RIGHT].[test1]
       )
        AND [LEFT].[id] <> [RIGHT].[id]

Using that, you compare the right to the left and the left to the right in each join, eliminating the need for the WHERE altogether.

However, this style of query grows exponentially in execution time for each row inserted into the table, since you're comparing each row to every row.

md5sum 2009-12-11 19:53:37

ansaurus

tags:

views:

answers:

Comparing SQL Table to itself (Self-join)

related questions