views:

125

answers:

3

I have a relational database with about 100 tables. Each table has unique, numerical, primary key with synthetic values and there are many foreign keys which link the tables. The tables are not big (tens or hundreds or records). This is a SQLite database.

I need, for testing purposes, to compare two copies of the database by a linux script (simple bash scripts, perl, diff, sed are available). I need to validate that the number of records of both databases is the same and that the records have the same content and to dump the differences. The problem is, that the values of the keys are allowed to be different as far as the relations are the same.

For example:

There is a table "country" with primary key "ix_country" and "name" and a table "customer" with fields "name", primary key "ix_customer" and foreign key "ix_country".

These two databases are equal: first database:

country: name="USA" ix_country=1; customer: name="Joe" ix_customer=10 ix_country=1

second database:

country: name="USA" ix_country=1771; customer: name="Joe" ix_customer=27 ix_country=1771

Both copies have the same structure.

Is there an easy way to do this?

Update:

One more requirement - the script must be robust against changes in the structure. It must work if a table or a field is added or deleted.

Update 2:

I started to work on the problem myself. The general strategy is to write a SQL scripts which creates "identity map" file. The map contains for each record its primary key value ("artificial identity") and "natural identity" key - a string which uniquely identifies the record. For some tables in the database, there is an unique natural id key (like contry name in my example). Other tables require ordinal number in a sequence and still others combine its own identity with identity in parent (maybe recursively if the parent has also a parent).

All records are dumped to second text file by a second SQL script in a format which identifies the artificial identities.

The a perl script replaces all artificial identities in the second file with their natural identities from the map.

The the result is sorted and diffed.

+5  A: 

Is there an easy to to do this

No. It's going to take programming work.

Andy Lester
I can write the program for the task; I just hope there is a trick unknown to me or a preexisting tool :-)
danatel
+3  A: 

If the database is pretty simple, running a query on the commandline which dumps all data properly formatted, without the ids, properly sorted and comparing with diffcould get you a large way.

e.g.

sqlite3 test.db 'CREATE TABLE Country (id  integer, name varchar(20))'
sqlite3 test.db 'CREATE TABLE Customer (id  integer, name varchar(20), country integer)'
sqlite3 test.db 'insert into country values (1, "USA")'
sqlite3 test.db 'insert into country values (2, "Belgium")'
sqlite3 test.db 'insert into customer values (1, "Joe", 1)'
sqlite3 test.db 'insert into customer values (1, "Peter", 2)'

sqlite3 test.db 'select cust.name, c.name from customer cust, country c where cust.country = c.id order by c.name, cust.name'

Peter|Belgium
Joe|USA

sqlite3 test.db 'select cust.name, c.name from customer cust, country c where cust.country = c.id order by c.name, cust.name' >db1.txt

doing the last query in a bash script, running it on both db's and diffing the 2 files will give you the different customers without programming.

This breaks down of course when the datamodel is more convoluted.

Peter Tillemans
In particular this won't check that the relationships are the same, which is probably the hardest requirement. He doesn't care about the primary keys but he does care that they are referenced in the same structure.
mpeters
A: 

I started to work on the problem myself. The general strategy is to write a SQL scripts which creates "identity map" file. The map contains for each record its primary key value ("artificial identity") and "natural identity" key - a string which uniquely identifies the record. For some tables in the database, there is an unique natural id key (like contry name in my example). Other tables require ordinal number in a sequence and still others combine its own identity with identity in parent (maybe recursively if the parent has also a parent).

All records are dumped to second text file by a second SQL script in a format which identifies the artificial identities.

The a perl script replaces all artificial identities in the second file with their natural identities from the map.

The the result is sorted and diffed.

danatel