I work with data that comes from multiple sources that I have no control over. These sources tend to have duplicates in the "key" values. I need to keep any of these duplicate values form matching in a join.
Using the following data
T1
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns |
+----+----------+-----------+----------+---------------------+
| 01 | Prod1 | ABC1 | 201 | Jun 2010, A, 101 |
| 02 | Prod2 | DEF2 | 202 | May 2009, A, 101 |
| 03 | Prod2 | DEF2 | 202 | May 2010, S, 101 |
| 04 | Prod3 | | 206 | Jun 2010, A, 103 |
| 05 | Prod4 | | 207 | Jun 2011, S, 103 |
T2
| ID | FirstKey | SecondKey | ThirdKey | AdditionalColumns |
+----+----------+-----------+----------+---------------------+
| 01 | Prod1 | ABC1 | 201 | Jun 2010, A, 101 |
| 02 | Prod2 | DEF2 | | May 2009, A, 101 |
| 03 | Prod2 | DEF2 | 202 | May 2010, S, 101 |
| 04 | Prod3 | | | Jun 2010, A, 103 |
| 05 | Prod4 | | 207 | Jun 2011, S, 103 |
| 06 | Prod1 | ABC1 | 201 | Jun 2010, T, 101 |
Now if we do the query:
SELECT
T1.FirstKey, T1.SecondKey, T1.ThirdKey,
T2.FirstKey, T2.SecondKey, T2.ThirdKey,
T1.AdditionalColumns, T2.AdditionalColumns
FROM
T1 JOIN T2 ON T1.FirstKey = T2.FirstKey
AND T1.SecondKey = T2.SecondKey
AND T1.SecondKey IS NOT NULL
UNION
SELECT
T1.FirstKey, T1.SecondKey, T1.ThirdKey,
T2.FirstKey, T2.SecondKey, T2.ThirdKey,
T1.AdditionalColumns, T2.AdditionalColumns
FROM
T1 JOIN T2 ON T1.FirstKey = T2.FirstKey
AND T1.ThirdKey = T2.ThirdKey
AND T1.SecondKey IS NULL
We get the following Results
FirstKey SecondKey ThirdKey FirstKey SecondKey ThirdKey AdditionalColumns AdditionalColumns
-------- --------- -------- -------- --------- -------- ----------------- -----------------
Prod1 ABC1 201 Prod1 ABC1 201 Jun 2010, A, 101 Jun 2010, A, 101
Prod1 ABC1 201 Prod1 ABC1 201 Jun 2010, A, 101 Jun 2010, T, 101
Prod2 DEF2 202 Prod2 DEF2 202 May 2009, A, 101 May 2010, S, 101
Prod2 DEF2 202 Prod2 DEF2 202 May 2010, S, 101 May 2010, S, 101
Prod4 NULL 207 Prod4 NULL 207 Jun 2011, S, 103 Jun 2011, A, 103
I need the query to only return the records with an authoritative match. e.g. Only 1 match between the tables.
FirstKey SecondKey ThirdKey FirstKey SecondKey ThirdKey AdditionalColumns AdditionalColumns
-------- --------- -------- -------- --------- -------- ----------------- -----------------
Prod4 NULL 207 Prod4 NULL 207 Jun 2011, S, 103 Jun 2011, A, 103
Is there a way to do this in the JOIN?
Currently I can get uniques by making CTE for each table that guarantees uniqueness on the keys used in the join. This works, but is ugly and adds significant work to the query.
Is there another way to do this join that will exclude the duplicate matches? This assumes I cannot programatically exclude any of the duplicate rows based on the AdditionalColumns data.
I run into this over and over so the CTE method seems just kludgey as it must be a problem that's been solved.