I'm trying to find duplicate rows based on mixed columns. This is an example of what I have:
CREATE TABLE Test
(
id INT PRIMARY KEY,
test1 varchar(124),
test2 varchar(124)
)
INSERT INTO TEST ( id, test1, test2 ) VALUES ( 1, 'A', 'B' )
INSERT INTO TEST ( id, test1, test2 ) VALUES ( 2, 'B', 'C' )
Now if I run this query:
SELECT [LEFT].[ID]
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
ON [LEFT].[ID] != [RIGHT].[ID]
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]
I would expect to get back both id's. (1 and 2), however I only ever get back the one row.
My thoughts would be that it should compare each row, but I guess this is not correct? To fix this I had changed my query to be:
SELECT [LEFT].[ID]
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
ON [LEFT].[ID] != [RIGHT].[ID]
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]
OR [LEFT].[TEST2] = [RIGHT].[TEST1]
Which gives me both rows, but the performance degrades extremely quickly based on the number of rows.
The final solution I came up for for performance and results was to use a union:
SELECT [LEFT].[ID]
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
ON [LEFT].[ID] != [RIGHT].[ID]
WHERE [LEFT].[TEST1] = [RIGHT].[TEST2]
UNION
SELECT [LEFT].[ID]
FROM [TEST] AS [LEFT]
INNER JOIN [TEST] AS [RIGHT]
ON [LEFT].[ID] != [RIGHT].[ID]
WHERE [LEFT].[TEST2] = [RIGHT].[TEST1]
But overall, I'm obviously missing an understanding of why this is not working which means that I'm probably doing something wrong. Could someone point me in the proper direction?