views:

249

answers:

5

I have a basic db schema comprising 2 tables; One is a simple ID -> Text list of terms, and the other has 2 columns, parent and child. The ids in the first table are generated on insert by a db sequence while the second table contains a mapping between keys to store the 'structure' of the hierarchy.

My problem is that I may want to sometimes move a tree from one db to another. If I have 2 DBs, each with 10 terms in (Database A's terms != Database B's terms, and there's no overlap), and I just copy the data from A to B then I'll get an obvious problem that the terms will be renumbered but the relationships wont. Clearly in this example just adding 10 to all the relationship keys will work, but does anyone know of a general algorithm to do this?

The DB is oracle 11g, and an oracle specific solution is fine...

+5  A: 

Quick answer

Import into a staging table, but populate mapped ID values from the same sequence used to produce ID values from the destination table. This is guaranteed to avoid conflicts between ID values as DBMS engine supports concurrent access to sequences.

With the ID values on the node mapped (see below) re-mapping the ID values for the edges is trivial.

Longer answer

You will need a mechanism that maps the values between the old keys from the source and new keys in the destination. The way to do this is to create intermediate staging tables that hold the mappings between the old and new kays.

In Oracle, autoincrementing keys are usually done with sequences in much the way you've described. You need to construct staging tables with a placeholder for the 'old' key so you can do the re-mapping. Use the same sequence as used by the application to populate the ID values on actual destination database tables. The DBMS allows concurrent accesses to sequences and using the same sequence guarantees that you will not get collisions in the mapped ID values.

If you have a schema like:

create table STAGE_NODE (
       ID int
      ,STAGED_ID int
)
/

create table STAGE_EDGE (
       FROM_ID   int
      ,TO_ID     int
      ,OLD_FROM_ID int
      ,OLD_TO_ID int
)
/

This will allow you to import into the STAGE_NODE table, preserving the imported key values. The insert process puts the original ID from the imported table into STAGED_ID and populates ID from the sequence.

Make sure you use the same sequence that's used for populating the ID column in the destination table. This ensures that you won't get key collisions when you go to insert to the final destination table. It is important to re-use the same sequence.

As a useful side effect this will also allow the import to run while other operations are taking place on the table; concurrent reads on a single sequence are fine. If necessary you can run this type of import process without bringing down the applciation.

Once you have this mapping in the staging table, ID values in the EDGE table are trivial to compute with a query like:

select node1.ID         as FROM_ID
      ,node2.ID         as TO_ID
  from STAGE_EDGE se
  join STAGE_NODE node1
    on node1.STAGED_ID = se.OLD_FROM_ID
  join STAGE_NODE node2
    on node2.STAGED_ID = se.OLD_TO_ID 

The mapped EDGE values can be populated back into the staging tables using an UPDATE query with a similar join or inserted directly into the destination table from a query similar to the one above.

ConcernedOfTunbridgeWells
A: 

I used to do this kind of thing a lot, but my memory is a bit hazy. I'll give you the general idea, hope it can get you pointed in the right direction.

Basically you can only do this if you've got a reliable second 'unique key' column in the 'parent' table. If not, you'll need to create one.

Say we have these tables

ITEMS[id, A, key] //id: 'real' id, A: just some column, key: the alternate key

HIERARCHY[idparent, idchild]

What you want to do is first copy ITEMS from SOURCEDB to TARGETDB, letting TARGETDB create its own values for the id column.

Then you need to copy HIERARCHY from SOURCEDB to TARGETDB, but you need to do a join like so to get the new id:

SOURCEDB.HIERARCHY.idparent 
      -> SOURCEDB.ITEMS.id 
      -> SOURCEDB.ITEMS.key 
      -> TARGETDB.ITEMS.key 
      -> TARGETDB.ITEMS.id

And you need to do the SAME thing for the idchild column.

This will give something like this (untested, and rusty, and probably mssql syntax):

//step 1
INSERT TARGETDB.ITEMS(A, key)
SELECT A, key FROM SOURCEDB.ITEMS

//step 2
INSERT TARGETDB.HIERARCHY(idparent, idchild)
SELECT T1.id, T2.id
FROM SOURCEDB.HIERARCHY AS H1
     INNER JOIN SOURCEDB.ITEMS AS I1 ON H1.idparent = I1.id
     INNER JOIN TARGETDB.ITEMS AS T1 ON I1.key = T1.key
     INNER JOIN SOURCEDB.ITEMS AS I2 ON H1.idchild = I2.id
     INNER JOIN TARGETDB.ITEMS AS T2 ON I2.key = T2.key

I'm presuming that these two databases are 'connected' enough that you can do cross-database queries. If you have to serialise to file, it gets a bit more ... complicated.

Benjol
A: 

You can achieve what you need using a temp table in the target database. Since IDs are auto-generated, the following code will not generate any collisions.

I'm going to assume the source database is called SourceDb and the target database is called TargetDb. I'm also going to assum this table structure:
Terms: ID, Text
Relationships: ParentId, ChildId

Create a temporary table in TargetDb with this structure:
TempTerms: OldId, Text, OldParentId, NewId, NewParentId

The following code will copy your subtree to the target database.

declare
    RootOfSubtreeId SourceDb.Terms.Id%type;
    TermCursor sys_refcursor;
begin
    --//Copy the data from SourceDb into the TargetDb temp table.
    --//This query gets the entire subtree of data with the root of the subtree having ID=RootOfSubTreeId.
    insert into TargetDb.TempTerms
    (
        OldId, Text, OldParentId
    )
    with RelatedTerms as
    (
        select
            T.ID, T.Text, R.ParentId
        from
            SourceDb.Terms T
            join SourceDb.Relationships R
            on R.ChildId = T.ID
    )
    select
        ID,
        Text,
        ParentId
    from
        RelatedTerms
    connect by
        prior ID = ParentId
    start with
        ID = RootOfSubtreeId;

    --//Open a cursor to loop over all of the temporary data.
    open TermCursor for
    select
        *
    from
        TargetDb.TempTerms;

    for term in TermCursor
    loop
        --//Insert the item into TargetDb's Terms table and get the new id back.
        insert into TargetDb.Terms
        ( ID, Text )
        values
        ( term.Text )
        returning ID into NewTermId;

        --//Update the temp table's NewId column for the newly inserted row.
        update TargetDb.TempTerms
        set    NewId = NewTermId
        where  OldId = term.OldId;

        --//Update the temp table's NewParentId column for all children of the newly inserted row.
        update TargetDb.TempTerms
        set    NewParentId = NewTermId
        where  OldParentId = term.OldId;
    end loop;

    --//Add all relationship data to TargetDb using the new IDs found above.
    insert into TargetDb.Relationships
    ( ParentId, ChildId )
    select
        NewParentId, NewId
    from
        TargetDb.TempTerms
    where
        NewParentId is not null;
end;
Aaron
+3  A: 

Overview

I will give four solutions, starting with the simplest. With each solution I will explain the situations in which it would be applicable.

Each of these solutions assumes that databases A and B have the following tables:

create table Terms
(
  ID int identity(1,1),
  Text nvarchar(MAX)
)

create table Relationships
(
  ParentID int,
  ChildID int
)

Solution 1

This is the simplest solution. It should be used if:

  • Terms with identical text may be merged together

The following will merge all terms and relationships from A into B:

insert into A.Terms (Text)
  select Text
  from A.Terms
  where Text not in (select Text from B.Terms)

insert into B.Relationships (ParentID, ChildID)
  select
    (select ID
     from B.Terms BTerms inner join A.Terms ATerms on BTerms.Text = ATerms.Text
     where ATerms.ID = Relationships.ParentID),
    (select ID
     from B.Terms BTerms inner join A.Terms ATerms on BTerms.Text = ATerms.Text
     where ATerms.ID = Relationships.ChildID)
  from A.Relationships

Basically you first copy the terms, then copy the relationships mapping the old id to the new id based on the text.

Note: In your question you state the terms are disjoint between the two input databases. In that case the where clause in the first insert into may be omitted.

Solution 2

This is the next-simplest solution. It should be used if:

  • Terms with the same Text must be kept distinct, and
  • You can add a column to the destination table

First add an int column to your Terms table called "OldID", then use the following to merge all terms and relationships from A to B:

insert into A.Terms (Text, OldID)
  select Text, ID
  from A.Terms
  where Text not in (select Text from B.Terms)

insert into B.Relationships (ParentID, ChildID)
  select
    (select ID from B.Terms where OldID = ParentID),
    (select ID from B.Terms where OldID = ChildID)
  from A.Relationships

Solution 3

This solution uses iteration. It should be used if:

  • Terms with the same Text must be kept distinct, and
  • You cannot modify the destination table, and
  • Either (a) your ID column is an identity column (in Oracle, this means it has a trigger that uses a sequence), or (b) you want a general method that will work with any database technology

The following will merge all terms and relationships from A into B:

declare TermsCursor sys_refcursor; 
begin 

-- Create temporary mapping table
create table #Temporary (OldID int, NewID int)

-- Add terms one at a time, remembering the id mapping
open TermsCursor for select * from A.Terms;
for term in TermsCursor 
loop
  insert into B.Terms (Text) values ( term.Text ) returning ID into NewID;
  insert into Temporary ( OldID, NewID ) values ( term.ID, NewID );
end loop; 

-- Transfer the relationships
insert into B.Relationships (ParentID, ChildID)
  select
    (select ID
     from B.Terms BTerms inner join Temporary on BTerms.ID = Temporary.NewID
     where Temporary.OldID = Relationships.ParentID),
    (select ID
     from B.Terms BTerms inner join Temporary on BTerms.ID = Temporary.NewID
     where Temporary.OldID = Relationships.ChildID),
  from A.Relationships

-- Drop the temporary table
drop table #Temporary

end

Solution 4

This solution is Oracle-specific, requires you to know the sequence used to generate ID values, and is less efficient than some of the other solutions. It should be used if:

  • Terms with the same Text must be kept distinct, and
  • You cannot modify the destination table, and
  • You have access to the sequence that generates your ID column, and
  • You are ok using a techinique that will not port to a non-Oracle database technology

The following will merge all terms and relationships from A into B:

-- Create temporary mapping table
create table #Temporary (OldID int, NewID int)

-- Add terms to temporary mapping table
insert into #Tempoarary ( OldID, NewID )
select ID, sequence.nexval
from A.Terms

-- Transfer the terms
insert into B.Terms ( ID, Text )
select NewID, Text
from A.Terms inner join Temporary on ID = OldID

-- Transfer the relationships
insert into B.Relationships (ParentID, ChildID)
  select
    (select ID
     from B.Terms BTerms inner join Temporary on BTerms.ID = Temporary.NewID
     where Temporary.OldID = Relationships.ParentID),
    (select ID
     from B.Terms BTerms inner join Temporary on BTerms.ID = Temporary.NewID
     where Temporary.OldID = Relationships.ChildID),
  from A.Relationships

-- Drop the temporary table
drop table #Temporary
Ray Burns
Oracle doesn't have anything equivalent to identity columns (at least it didn't when I last used it). This might be relevant to SQL Server but you wouldn't need to do it on Oracle.
ConcernedOfTunbridgeWells
Actually Oracle does have an equivalent to identity columns: A trigger together with a sequence. A well-designed database will use this. Visage may need to use my `Solution 3` in Oracle if he doesn't have access to the sequence that generates the ID value. Also in common cases he can get by with `Solution 1` or `Solution 2` which are both simpler and more efficient because they don't create a temporary table. For situations in which the sequence *is* accessible, I have added a `Solution 4` that shows how to use it to avoid the iteration.
Ray Burns
A: 

What about passing around the data as XML? It is naturally designed to work with tree structures, and many DBMSes include good support for XML parsing and conversion.

You'll have to ensure node X in DB1 maps to node Y in DB 2, but for that you should use some fact about the node (a name, etc) beyond the primary key.

You can also offset the keys for each DB by a regular amount (say 2^32) and use a BIG INTEGER key. Limits entries to 2^32, but helpful still.

(I may be misunderstanding the question here, but I hope not.)

BobMcGee