tags:

views:

941

answers:

3

What is a good way to calculate difference (in a sense what should be added and deleted from one table to get another) between tables in MySQL?

+1  A: 
SELECT DISTINCT id FROM a WHERE NOT EXISTS (SELECT * FROM b WHERE a.id = b.id);
SELECT DISTINCT id FROM b WHERE NOT EXISTS (SELECT * FROM a WHERE a.id = b.id);
vava
A: 

You can also use a left outer join (the first tells you where a row exists in table a and not b, the second vice-versa):

SELECT a.id FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL
SELECT b.id FROM b LEFT JOIN a ON b.id = a.id WHERE a.id IS NULL
BrynJ
+2  A: 

Neither of the answers posted so far (from BrynJ and Vadim) does a very thorough job. And doing the thorough job is also incredibly hard. Both answers assume that it is sufficient to know which ID numbers are present in each table. However, in general, tables have more than one column.

Let's call the tables A and B.

One important question is "do the two tables have the same schema"? If not, then one issue is which columns need to be added to A and which need to be added to B to make their schemas the same. This is a metadata query, answerable from the system catalog. Which values should be inserted in the columns added to the tables is an interesting question.

Let's assume that the tables actually have the same schema, including the same primary key and the same functional dependencies between columns. Let's also assume that there is an ID column (storing a unique integer), and Name column (a string), and a RefDate column of type DATE.

Table A                                 Table B
ID  Name         RefDate                ID  Name         RefDate
1   Frederick    2007-01-23             1   Josephine    2009-01-10

Now, what needs to be inserted, deleted, updated from each table to make them the same?

I think it is fair to say that there is no single answer to that without knowing a lot more context. It might be that Frederick has undergone gender-change surgery since 2007, and the entry in B represents her new identity. Or it might be a blunder; the database should not store both those records. Or there might be another resolution.

Unfortunately, the queries from BrynJ and Vadim would both indicate that there is no difference between A and B, which is a dubious proposition to me.

Incidentally, note that comparing rows when the rows might have nulls is more complex than when they don't. For example, consider comparing names:

No nulls:

(A.Name = B.Name)

With nulls:

(A.Name = B.Name OR (A.Name IS NULL AND B.Name IS NULL))

One more reason to shun nulls whenever you can.

Jonathan Leffler