ansaurus

Question

Merging databases how to handle duplicate PK's

Answer 1

A:

Hi,

One thing you could do is set up the tables with regional data to use GUID's. That way, the primary keys in each region are unique, and you can mix and match data (import data from one region to another). For the tables which have shared data (like type tables), you can keep the primary keys the way they are (since they should be the same everywhere).

Here is some information about GUID's: http://www.sqlteam.com/article/uniqueidentifier-vs-identity

Maybe SQL Server Management Studio lets you convert columns to use GUID's easily. I hope so!

Best of luck.

Jon 2009-07-07 19:02:54

Answer 2

+1 A:

I have no first-hand experience with this, but it seems to me like you ought to be able to uniquely map PK -> New PK for each server. For instance, generate new PKs such that data from LA server has PK % 3 == 2, SF has PK % 3 == 1, and NY has PK % 3 == 0. And since, as I understood your question anyway, each server only stores FK relationships to its own data, you can update the FKs in identical fashion.

NewLA = OldLA*3-1
NewSF = OldLA*3-2
NewNY = OldLA*3

You can then merge those and have no duplicate PKs. This is essentially, as you already said, just generating new PKs, but structuring it this way allows you to trivially update your FKs (assuming, as I did, that the data on each server is isolated). Good luck.

Nick Lewis 2009-07-07 19:13:20

Answer 3

A:

BEST: add a column for RegionCode, and include it on your PKs, but you don't want to do all the leg work.

HACK: if your IDs are INTs, a quick fix would be to add a fixed value based on region to each key on import. INTs can be as large as: 2,147,483,647

local server data:

LA IDs: 1,2,3,4,5,6
SF IDs: 1,2,3,4,5
NY IDs: 1,2,3,4,5,6,7,9

add 100000000 to LA's IDs
add 200000000 to SF's IDs
add 300000000 to NY's IDs

combined server data:

LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006
SF IDs: 200000001,200000002,200000003,200000004,200000005
NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009

KM 2009-07-07 19:19:07

Answer 4

A:

what i have done in a situation like this is this:

create a new db with the same schema but only tables. no pk fk, checks etc.
transfer data from DB1 to this source db
for each table in target database find the top number for the PK
for each table in the source database update their pk, fk etc starting with the (top number + 1) from the target db
for each table in target database set identity insert to on
import data from source db to target db
for each table in target database set identity insert to off
clear source db
repeat for DB2

Mladen Prajdic 2009-07-07 19:23:31

Answer 5

A:

As Jon mentioned, I would use GUIDs to solve the merge task. And I see two different solutions that required GUIDs:

1) Permanently change your database schema to use GUIDs instead of INTEGER (IDENTITY) as primary key.

This is a good solution in general, but if you have a lot of non SQL code that is somehow bound to the way your identifiers work, it could require quite some code changes. Probably since you merge databases, you may anyways need to update your application so that it is working with one region data only based on the user logged in etc.

2) Temporarily add GUIDs for migration purposes only, and after the data is migrated, drop them:

This one is kind-of more tricky, but once you write this migration script, you can (re-)run it multiple times to merge databases again in case you screw it the first time. Here is an example:

Table: PERSON (ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL)
Table: ADDRESS (ID INT PRIMARY KEY, City VARCHAR(100) NOT NULL, PERSON_ID INT)

Your alter scripts are (note that for all PK we automatically generate the GUID):

ALTER TABLE PERSON ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD PERSON_UID UNIQUEIDENTIFIER NULL

Then you update the FKs to be consistent with INTEGER ones:

--// set ADDRESS.PERSON_UID
UPDATE  ADDRESS
SET     ADDRESS.PERSON_UID = PERSON.UID
FROM    ADDRESS
INNER JOIN PERSON
    ON  ADDRESS.PERSON_ID = PERSON.ID

You do this for all PKs (automatically generate GUID) and FKs (update as shown above).

Now you create your target database. In this target database you also add the UID columns for all the PKs and FKs. Also disable all FK constraints.

Now you insert from each of your source databases to the target one (note: we do not insert PKs and integer FKs):

INSERT INTO TARGET_DB.dbo.PERSON (UID, NAME)
SELECT UID, NAME FROM SOURCE_DB1.dbo.PERSON

INSERT INTO TARGET_DB.dbo.ADDRESS (UID, CITY, PERSON_UID)
SELECT UID, CITY, PERSON_UID FROM SOURCE_DB1.dbo.ADDRESS

Once you inserted data from all the databases, you run the code opposite to the original to make integer FKs consistent with GUIDs on the target database:

--// set ADDRESS.PERSON_ID
UPDATE  ADDRESS
SET     ADDRESS.PERSON_ID = PERSON.ID
FROM    ADDRESS
INNER JOIN PERSON
    ON  ADDRESS.PERSON_UID = PERSON.UID

Now you may drop all the UID columns: ALTER TABLE PERSON DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN PERSON_UID

So at the end you should get a rather long migration script, that should do the job for you. The point is - IT IS DOABLE

NOTE: all written here is not tested.

van 2009-07-07 19:55:25

Basically the biggest problem are the relations between tables, and the *workflow* in the answer shows how to solve it. Good luck!

van 2009-07-07 19:56:38

Answer 6

+1 A:

I have done this and I say change your keys (pick a method) rather than changing your code. Invariably you will either miss a stored procedure or introduce a bug. With data changes, it is pretty easy to write tests to look for orphaned records or to verify that things were matched up correctly. With code changes, especially code that is working correctly, it is too easy to miss something.

Bill 2009-07-07 19:56:10

ansaurus

tags:

views:

answers:

Merging databases how to handle duplicate PK's

1) Permanently change your database schema to use GUIDs instead of INTEGER (IDENTITY) as primary key.

2) Temporarily add GUIDs for migration purposes only, and after the data is migrated, drop them:

related questions