views:

43

answers:

3

I have two tables, A and B, that have the same structure (about 30+ fields). Is there a short, elegant way to join these tables and only select rows where one or more columns differ? I could certainly write some script that creates the query with all the column names but maybe there is an SQL-only solution.

To put it another way: Is there a short substitute to this:

SELECT *
FROM   table_a a
  JOIN table_b b ON a.pkey=b.pkey
WHERE  a.col1 != b.col2
    OR a.col2 != b.col2
    OR a.col3 != b.col3 # .. repeat for 30 columns
A: 

The best way I can think of is to create a temporary table with the same structure also, but with a unique restriction across the 30 fields you want to check for. Then insert all rows from table A into the temp table, then all rows from table B into the temp table... As the rows from B go in, (use insert ignore) the ones that are not unique on at least 1 column will be dropped. The result will be that you have only rows where at least 1 column difffers in your temp table.. You can then select everything from that.

Zak
+1  A: 

There is a standard SQL way to do this (a MINUS SELECT), but MySQL (along with many other DBMSes) doesn't support it.

Failing that, you could try this:

SELECT a.* FROM a NATURAL LEFT JOIN b
    WHERE b.pkcol IS NULL

According to the MySQL documentation, a NATURAL JOIN will join the two tables on all identically named columns. By filtering out the a records where the b primary key column comes back NULL, you are effectively getting only the a records with no matching b table record.

FYI: This is based on the MySQL documentation, not personal experience.

Larry Lustig
Your example would need to be NATURAL LEFT JOIN in order to pull in rows of a with no matches in b.
Martin
... and you need to repeat the query, swapping a with b, in order to find rows in b with no match in a.
Martin
I beg your pardon, that is correct. I'm editing it now.
Larry Lustig
+1  A: 

Taking on data into account, there is no short way. Actually this is the only solid way to do it. One thing you might need to be careful with is proper comparison of NULL values in NULL-able columns. The query with OR tends to be slow, not mentioning if it is on 30 columns.

Also your query will not include records in table_b that do not have corresponding one in table_a. So ideally you would use a FULL JOIN.

If you need to perform this operation often, then you could introduce some kind of additional data column that gets updated always when anything in the row changes. This could be as simple as the TIMESTAMP column which gets updated with the help of UPDATE/INSERT triggers. Then when you compare, you even have a knowledge of which record is more recent. But again, this is not a bullet proof solution.

van