tags:

views:

193

answers:

5

Basically we have one table (original table) and it is backed up into another table (backup table); thus the two tables have exactly the same schema.

At the beginning both tables (original table and backup table) contains exactly the same set of data. After sometime for some reason I need to verify whether dataset in the original table has changed or not.

In order to do this I have to compare the dataset in the original table against the backup table.

Let's say the original table has the following schema:

`create table LemmasMapping (
   lemma1 int,
   lemma2 int,
   index ix_lemma1 using btree (lemma1),
   index ix_lemma2 using btree (lemma2)
)`

How could I achieve the dataset comparision?

Update: the table does not have a primary key. It simply stores mappings between two ids.

A: 
select count(*) 
from lemmas as original_table 
      full join backup_table using (lemma_id)
where backup_table.lemma_id is null
      or original_table.lemma_id is null
      or original_table.lemma != backup_table.lemma

The full join / check for null should cover additions or deletions as well as changes.

  • backup.id is null = addition
  • original.id is null = deletion
  • neither null = change
Kyle Butt
+4  A: 

I would write three queries.

  1. An inner join to pick up the rows where the primary key exists in both tables, but there is a difference in the value of one or more of the other columns. This would pick up changed rows in original.

  2. A left outer join to pick up the rows that are in the original tables, but not in the backup table (i.e. a row in original has a primary key that does not exist in backup). This would return rows inserted into original.

  3. A right outer join to pick up the rows in backup which no longer exist in original. This would return rows that have been deleted from original.

You could union the three queries together to return a single result set. If you did this you would need to add a column to indicate what type of row it is (updated, inserted or deleted).

With a bit of effort you might be able to do this in one query using a full outer join. Be careful with outer joins, as they behave differently in different SQL engines. Predicates put in the where clause, instead of the join clause can sometimes turn your outer join into an inner join.

Mike Thompson
This works. Thanks!
SiLent SoNG
+4  A: 

You can just use CHECKSUM TABLE and compare the results. You can even alter the table to enable live checksums so that they are continuously available.

CHECKSUM TABLE original_table, backup_table;

It doesn't require the tables to have a primary key.

Josh Davis
+1 for checksum
SiLent SoNG
A: 

Hi there,

If you are using SQL Server, then you might want to give a try to Volpet's Table Diff:

http://www.volpet.com/

You can try a fully-functional copy for 30 days.

Gia
A: 

For the lazier or more SQL-averse developer working with MS SQL Server, I would recommend SQL Delta (www.sqldelta.com) for this and any other database-diff type work. It has a great GUI, is quick and accurate and can diff all database objects, generate and run the necessary change scripts, synchronise entire databases. Its the next best thing to a DBA ;-)

I think there is a similar tool available from RedGate called SQL Compare. I believe some editions of the latest version of Visual Studio (2010) also include a very similar tool.

5arx
UPDATE - I've been using the Data compare tools in VS2010 and imho they're not fit to like the boots of either of the products above...
5arx
* lick the boots * even...
5arx