You can simplify this problem drastically if you require that certain columns do match, or at least start with some expectations of which columns should match. In other words, instead of looking at this as a non-matching problem, redefine it as a partial matching problem.
Let's say you expect agent
and agency
to match, but customer
and company
might not. This isn't too difficult:
SELECT
i.agent, i.agency, i.customer, i.company, p.customer, p.company,
CASE
WHEN i.customer = p.Customer THEN 'Y'
ELSE 'N'
END AS matchescustomer,
CASE
WHEN i.company = p.Company THEN 'Y'
ELSE 'N'
END AS matchescompany
FROM table1 i
INNER JOIN table2 p
ON p.agent = i.agent
AND p.agency = i.agency
If you want to check for other partial matches, just reorder the columns. Instead of joining on agent
and agency
, join on agent
and customer
, or whatever.
If you only expect a few different kinds of partial matches, you can write a few different queries similar to the one above and put them together with a UNION
(or UNION ALL
if you don't mind duplicates). In other words:
SELECT (columns)
FROM table1 i INNER JOIN table2 p
ON p.agent = i.agent AND p.agency = i.agency
UNION
SELECT (columns)
FROM table1 i INNER JOIN table2 p
ON p.agent = i.agent AND p.customer = i.customer
Now if you're looking to get every conceivable mismatch, then this is quickly going to get out of control, so you might want to adapt a more heuristic method, and search for partial matches that match at least a certain number of columns (say 3). Then you can restrict the obnoxiousness to at most the number of columns you have to compare:
;WITH PartialMatches_CTE AS
(
SELECT i.agent AS iagent, p.agent AS pagent, ... (etc.)
FROM table1 i INNER JOIN table2 p ON p.agent = i.agent
UNION ALL
SELECT (...) FROM table1 INNER JOIN table2 ON p.agency = i.agency
UNION ALL
SELECT (...) FROM table1 INNER JOIN table2 ON p.company = i.company
... and so on
),
ResolvedMatches_CTE AS
(
SELECT DISTINCT
iagent, pagent, iagency, pagency, ...,
CASE WHEN pagent = iagent THEN 'Y' ELSE 'N' END AS agentmatch,
CASE WHEN pagency = iagency THEN 'Y' ELSE 'N' END AS agencymatch,
...,
(CASE WHEN pagent = iagent THEN 1 ELSE 0 END +
CASE WHEN pagency = iagency THEN 1 ELSE 0 END +
...) AS MatchCount
FROM PartialMatches_CTE
)
SELECT *
FROM ResolvedMatches_CTE
WHERE MatchCount >= 3
Now, having said all this, there's one thing I'm wondering...
These two data tables aren't, perchance, sequentially related, are they? As in, the 3rd row in table1 always maps to the 3rd row in table 2, but may not match all columns? It's a shot in the dark, but if that is indeed the case, then we could make this way simpler. Otherwise, the last query here should probably do what you want without becoming too much of an unmaintainable mess.
Note that performance is likely going to stink for all of these queries. Hopefully your data sets aren't too large. AFAIK there is no easy way to really optimize this kind of thing.