views:

329

answers:

6

We have 3 databases that are physically separated by region, one in LA, SF and NY. All the databases share the same schema but contain data specific to their region. We're looking to merge these databases into one and mirror it. We need to preserve the data for each region but merge them into one db. This presents quite a few issues for us, for example we will certainly have duplicate Primary Keys, and Foreign Keys will be potentially invalid.

I'm hoping to find someone who has had experience with a task like this who could provide some tips, strategies and words of experience on how we can accomplish the merge.

For example, one idea was to create composite keys and then change our code and sprocs to find the data via the composite key (region/original pk). But this requires us to change all of our code and sprocs.

Another idea was to just import the data and let it generate new PK's and then update all the FK references to the new PK. This way we potentially don't have to change any code.

Any experience is welcome, thanks!

A: 

Hi,

One thing you could do is set up the tables with regional data to use GUID's. That way, the primary keys in each region are unique, and you can mix and match data (import data from one region to another). For the tables which have shared data (like type tables), you can keep the primary keys the way they are (since they should be the same everywhere).

Here is some information about GUID's: http://www.sqlteam.com/article/uniqueidentifier-vs-identity

Maybe SQL Server Management Studio lets you convert columns to use GUID's easily. I hope so!

Best of luck.

Jon
+1  A: 

I have no first-hand experience with this, but it seems to me like you ought to be able to uniquely map PK -> New PK for each server. For instance, generate new PKs such that data from LA server has PK % 3 == 2, SF has PK % 3 == 1, and NY has PK % 3 == 0. And since, as I understood your question anyway, each server only stores FK relationships to its own data, you can update the FKs in identical fashion.

NewLA = OldLA*3-1
NewSF = OldLA*3-2
NewNY = OldLA*3

You can then merge those and have no duplicate PKs. This is essentially, as you already said, just generating new PKs, but structuring it this way allows you to trivially update your FKs (assuming, as I did, that the data on each server is isolated). Good luck.

Nick Lewis
A: 

BEST: add a column for RegionCode, and include it on your PKs, but you don't want to do all the leg work.

HACK: if your IDs are INTs, a quick fix would be to add a fixed value based on region to each key on import. INTs can be as large as: 2,147,483,647

local server data:

LA IDs: 1,2,3,4,5,6
SF IDs: 1,2,3,4,5
NY IDs: 1,2,3,4,5,6,7,9

add 100000000 to LA's IDs
add 200000000 to SF's IDs
add 300000000 to NY's IDs

combined server data:

LA IDs: 100000001,100000002,100000003,100000004,100000005,100000006
SF IDs: 200000001,200000002,200000003,200000004,200000005
NY IDs: 300000001,300000002,300000003,300000004,300000005,300000006,300000007,300000009
KM
A: 

what i have done in a situation like this is this:

  1. create a new db with the same schema but only tables. no pk fk, checks etc.
  2. transfer data from DB1 to this source db
  3. for each table in target database find the top number for the PK
  4. for each table in the source database update their pk, fk etc starting with the (top number + 1) from the target db
  5. for each table in target database set identity insert to on
  6. import data from source db to target db
  7. for each table in target database set identity insert to off
  8. clear source db
  9. repeat for DB2
Mladen Prajdic
A: 

As Jon mentioned, I would use GUIDs to solve the merge task. And I see two different solutions that required GUIDs:

1) Permanently change your database schema to use GUIDs instead of INTEGER (IDENTITY) as primary key.

This is a good solution in general, but if you have a lot of non SQL code that is somehow bound to the way your identifiers work, it could require quite some code changes. Probably since you merge databases, you may anyways need to update your application so that it is working with one region data only based on the user logged in etc.

2) Temporarily add GUIDs for migration purposes only, and after the data is migrated, drop them:

This one is kind-of more tricky, but once you write this migration script, you can (re-)run it multiple times to merge databases again in case you screw it the first time. Here is an example:

Table: PERSON (ID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL)
Table: ADDRESS (ID INT PRIMARY KEY, City VARCHAR(100) NOT NULL, PERSON_ID INT)

Your alter scripts are (note that for all PK we automatically generate the GUID):

ALTER TABLE PERSON ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD UID UNIQUEIDENTIFIER NOT NULL DEFAULT (NEWID())
ALTER TABLE ADDRESS ADD PERSON_UID UNIQUEIDENTIFIER NULL

Then you update the FKs to be consistent with INTEGER ones:

--// set ADDRESS.PERSON_UID
UPDATE  ADDRESS
SET     ADDRESS.PERSON_UID = PERSON.UID
FROM    ADDRESS
INNER JOIN PERSON
    ON  ADDRESS.PERSON_ID = PERSON.ID

You do this for all PKs (automatically generate GUID) and FKs (update as shown above).

Now you create your target database. In this target database you also add the UID columns for all the PKs and FKs. Also disable all FK constraints.

Now you insert from each of your source databases to the target one (note: we do not insert PKs and integer FKs):

INSERT INTO TARGET_DB.dbo.PERSON (UID, NAME)
SELECT UID, NAME FROM SOURCE_DB1.dbo.PERSON

INSERT INTO TARGET_DB.dbo.ADDRESS (UID, CITY, PERSON_UID)
SELECT UID, CITY, PERSON_UID FROM SOURCE_DB1.dbo.ADDRESS

Once you inserted data from all the databases, you run the code opposite to the original to make integer FKs consistent with GUIDs on the target database:

--// set ADDRESS.PERSON_ID
UPDATE  ADDRESS
SET     ADDRESS.PERSON_ID = PERSON.ID
FROM    ADDRESS
INNER JOIN PERSON
    ON  ADDRESS.PERSON_UID = PERSON.UID

Now you may drop all the UID columns: ALTER TABLE PERSON DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN UID ALTER TABLE ADDRESS DROP COLUMN PERSON_UID

So at the end you should get a rather long migration script, that should do the job for you. The point is - IT IS DOABLE

NOTE: all written here is not tested.

van
Basically the biggest problem are the relations between tables, and the *workflow* in the answer shows how to solve it. Good luck!
van
+1  A: 

I have done this and I say change your keys (pick a method) rather than changing your code. Invariably you will either miss a stored procedure or introduce a bug. With data changes, it is pretty easy to write tests to look for orphaned records or to verify that things were matched up correctly. With code changes, especially code that is working correctly, it is too easy to miss something.

Bill