I have two MySQL databases and I would like to write a script to compare and update data changes between them.
Does anyone know a Linux command line tool for diffing or patching data in MySQL databases?
I have two MySQL databases and I would like to write a script to compare and update data changes between them.
Does anyone know a Linux command line tool for diffing or patching data in MySQL databases?
The Way of the Brute Force: Dump both databases and diff the dumps...? ;-)
-- "If your problem is not solved by brute force, you are not applying enough force."
(I'm not (entirely) serious about this...)
As DevSolar suggests, the simple way to get the differences is to do a careful dump of the two databases (with one output file per table, and within each file, one logical line per record in the table), and apply the admirable, reliable and venerable diff
program to the files for each table. However, that 'careful' may be something of a spanner in the works - you need to ensure that each data file is dumped in a sorted order (not just a physical order), so that if a record appears in both, it appears at the same position in the file. If the data is not so ordered, you will get lots of spurious differences.
Even before you do that, you need to compare the two schemas - because many differences in the schema will automatically make every row in two tables with the same name have every row different. For example, if TableA from database DB1 has 10 columns, but TableA from Db2 has 11 columns, every row in the dumped data will be different.
You also need to worry about some other columns that can differ - notably automatically assigned ID numbers, and also 'last update time' or 'creation time' values. The automatic ID numbers in a primary key will often strongly influence the order of data in tables that join to the PK - you have to consider whether there is a good way around that. It will depend in part on the history of the databases; were they once a common database that got copied, modified, and are now being recombined? If so, there may be less problem than if they are two databases with the same schema but which have never had any common ancestry to the data stored in it.
You may find that your best bet is to create views such that the data structure reflected by the view is the same for both databases (even if the view definition is not the same because of differences in the schema). You can then compare the results of dumping those views. Done carefully, this can alleviate or minimize the differences due to automatically assigned ID numbers.
Let's pretend you managed to get comparable data and you now need to synchronize a modest number of differences. Is there a patching tool to do the job?
The answer is quite likely to the one you did not want to hear - No.
One issue is that you have to decide what is the required result of the operation. Is it the union of the two databases, or the intersection, or what? Which database are you going to modify - the first or the second, or both?
Rows deleted from one database that appear in the other can either be removed from the other or inserted into the one. Rows inserted are the mirror of rows deleted and need analogous treatment. That was easy...
Where the 'same row' appears in both databases by some criterion, but there are differences in the fields (columns), then you have a trickier job to do. You have to decide which of the different columns should be changed in the database you're currently modifying. The standard Unix tools (such as diff
) are designed for line-based differences. At this point, I'd probably drop into Perl (but Python or other scripting languages would do fine), taking the difference records for a table along with a table name and the column list (so that the fields in the data can be associated with columns in the database), and then arrange for it to generate the appropriate statements. Types may be a factor - your UPDATE statement may need to quote strings and not quote numbers for updates. You also need to know the primary key so that you can identify the row to be updated. The output would be a suitable set of UPDATE statements that would morph the first version of the table into the second.