views:

2581

answers:

3

Both databases have the same schema, but they may experience conflict with primary key in some tables. So I want them to just ignore the duplicate rows, and continue merging further.

+2  A: 

Best bet would probably be going with a 3rd party application such as RedGate SQL Data Compare. Costs some money, but it's worth it over writing that script IMO.

Gromer
Is there's any open source alternative !!!
Ish Kumar
+1  A: 

First a conflict of keys indicates that whatever process you are currently using is a poor one.

To correctly merge two database which are using autogenerated (non_GUID) keys, you need to take several steps. First add a new autogenerated key to the parent table, then import all the data from both tables, rename the old the old is file to ID_old and rename the new files to the old id name. At this point you can then move onthe the child tables. You will need to copy to child tables by joining to the parent table and taking the new id field as the value for the foreign key instead of the one in the existing table. You will need to repeat this process for every foreign key table and if that table is also a parent table, you will need to add the conversionid field to the table before copying any data, so that you you can work all the way down the chain. To do this properly involves a great deal of of knowlege of the structure of the database and lots of planning. Do not consider doing this without a good backup of both source databases. It is also best if the process can happen when both dabases are in single user mode.

If you use natural keys and have duplicates, you have a far different problem. All duplicate key records whould be moved to a separate table first and a detemination as to which is the more correct data should be made. In some cases you will find that the natural key is in fact not unique (they rarely are which is why I almost nver use them) and the merged database will need to work with an autogenerated key of some type. This will involve code changes as well as database changes, so it is the option of last resort.

What you find often with natural keys is that the data for each one is different but simliar (St. vice Street in the address) in this case mark one of the records for insert and then when do the insert in two steps, first the records which have no duplicates, then the records in the duplicates table that are marked for insertion. Remember you will have to examine all records in all foreign key tables to make the determination which to keep and which not to keep. Just throwing out any duplicates is a bad idea and you will lose data that way, possibly critical data (such as a customer's orders). This is a long tedious process which will require someone with expertise in the data to make the determinations. As a programmer, you should provide them a dedup tool that will let them examine all the data for each set of duplicates and choose what to keep and what to get rid of and then having marked everyithing, it will run a process to insert the records. Remeber in your design, that for true duplicates, there will be some child tables (such as orders ) that need the records from both sent to the database for the record chosen as the one to enter (orders is an example), for other tables you will want to choose which is correct (address for instance). So you can see this is a complex process requiring a thorough understanding othe database.

If you have a lot of duplicates, they may be cleaning up and adding the data for several months, so a tool is really critical. The people doing this will likely be system users not database specialists or programmers as they are the only people who truly can make the judgement most of the itme as to which record to keep. Likely you will need to do something simliar in any event as there may be records which are duplicates even when you have an auto-generated key. They are just more difficult to find.

There is no easy way to merge two databases (even using GUIDS, you have the problem of duplicates in the natural key).

HLGEM
A: 

For if you have Primary keys as IDENTITY here is my suggestion (shouldn't require modifying the schema).

  1. Set up all foreign keys so that ON UPDATE CASCADE is set
  2. Update the Primary Key / IDENTITY Field in the parent table and add the max value of the field of the corresponding table you are going to merge into (the FKs will then cascade the values to the child tables)
  3. Do the same for the PK / IDENTITY fields in the child tables
  4. Follow the suggestion from this forum answer and use SET IDENTITY_INSERT ON / OFF either side of Inserting each of the tables, starting with the parent table and then moving on to the child tables
icc97