views:

154

answers:

3

Are there techniques for comparing the same data stored in different schemas? The situation is something like this. If I have a db with schema A and it stores data for a feature in say, 5 tables. Schema A -> Schema B is done during an upgrade process. During the upgrade process some transformation logic is applied and the data is stored in 7 tables in Schema B. What i'm after is some way to verify data integrity, basically i would have to compare different schemas while factoring in the transformation logic. Short of writing some custom t-sql sprocs to compare the data, is there an alternate method? I'm leaning towards python to automate this, are there any python modules that would help me out? To better illustrate my question the following diagram is a rough picture of one of the many data sets i would need to compare, Properties 1,2,3 and 4 are migrated from Schema source to destination, but they are spread across different tables.

Table1Src                             Table1Dest
  |                                       |
  --ID(Primary Key)                       --ID(Primary Key)
  --Property1                             --Property1
  --Property2                             --Property5
  --Property3                             --Property6

Table2Src                             Table2Dest
  |                                       |
  --ID(Foreign Key->Table1Src)            --ID(Foreign Key->Table1Dest)
  --Property4                             --Property2
                                          --Property3

                                      Table3Dest
                                          |
                                          --ID(Foreign Key->Table1Dest)
                                          --Property4
                                          --Property7
+1  A: 

Basically, you should create object representations for both schema versions, and then compare objects. This is best done if they all fit into memory simultaneously; if not, you need to iterate over all objects in one representation, fetch the corresponding object in the other representation, compare them, and then do the same vice versa.

The difficult part may be to obtain object representations; you can see whether SQLAlchemy can be used conveniently for your tables. SQLAlchemy is, in principle, capable of mapping existing schema definitions onto objects.

Martin v. Löwis
A: 

I've used SQLAlchemy successfully for migration between one schema and another - that's a similar process (as indicated by Martin v. Löwis) as comparison. Especially if you use an .equals(other) method.

Matthew Schinckel
+2  A: 

Make "views" on both the schemas that translate to the same buisness representation of data. Export these views to flat files and then you can use any plain vanilla file diff utility to compare and point out differences.

Learning