views:

401

answers:

6

Hi,

let's assume i have a self referencing hierarchical table build the classical way like this one:

CREATE TABLE test (name text,id serial primary key,parent_id integer references test)

insert into test (name,id,parent_id) values ('root1',1,NULL),('root2',2,NULL),('root1sub1',3,1),('root1sub2',4,1),('root 2sub1',5,2),('root2sub2',6,2)

testdb=# select * from test;

name | id | parent_id -----------+----+-----------

root1 | 1 |

root2 | 2 |

root1sub1 | 3 | 1

root1sub2 | 4 | 1

root2sub1 | 5 | 2

root2sub2 | 6 | 2

What i need now is a function (preferrably in plain sql) that would take the id of a test record and clone all attached records (including the given one). The cloned records need to have new ids of course. The desired result would like this for example:

Select * from cloningfunction(2);

name | id | parent_id

-----------+----+-----------

root2 | 7 |
root2sub1 | 8 | 7

root2sub2 | 9 | 7

Any pointers? Im using PostgreSQL 8.3.

thanks!

+5  A: 

Pulling this result in recursively is tricky (although possible). However, it's typically not very efficient and there is a much better way to solve this problem.

Basically, you augment the table with an extra column which traces the tree to the top - I'll call it the "Upchain". It's just a long string that looks something like this:

name | id | parent_id | upchain
root1 | 1 | NULL | 1:
root2 | 2 | NULL | 2:
root1sub1 | 3 | 1 | 1:3:
root1sub2 | 4 | 1 | 1:4:
root2sub1 | 5 | 2 | 2:5:
root2sub2 | 6 | 2 | 2:6:
root1sub1sub1 | 7 | 3 | 1:3:7:

It's very easy to keep this field updated by using a trigger on the table. (Apologies for terminology but I have always done this with SQL Server). Every time you add or delete a record, or update the parent_id field, you just need to update the upchain field on that part of the tree. That's a trivial job because you just take the upchain of the parent record and append the id of the current record. All child records are easily identified using LIKE to check for records with the starting string in their upchain.

What you're doing effectively is trading a bit of extra write activity for a big saving when you come to read the data.

When you want to select a complete branch in the tree it's trivial. Suppose you want the branch under node 1. Node 1 has an upchain '1:' so you know that any node in the branch of the tree under that node must have an upchain starting '1:...'. So you just do this:

SELECT *
FROM table
WHERE upchain LIKE '1:%'

This is extremely fast (index the upchain field of course). As a bonus it also makes a lot of activities extremely simple, such as finding partial trees, level within the tree, etc.

I've used this in applications that track large employee reporting hierarchies but you can use it for pretty much any tree structure (parts breakdown, etc.)

Notes (for anyone who's interested):

  • I haven't given a step-by-step of the SQL code but once you get the principle, it's pretty simple to implement. I'm not a great programmer so I'm speaking from experience.
  • If you already have data in the table you need to do a one time update to get the upchains synchronised initially. Again, this isn't difficult as the code is very similar to the UPDATE code in the triggers.
  • This technique is also a good way to identify circular references which can otherwise be tricky to spot.
njreed.myopenid.com
If I could upvote this twice, I would... Very nice!
Mike G.
I think that the trigger must update all the children records of the hierarchy. E.g. if we update parent_id of record "root2" to 1, we must also update records "root2sub1" and "root2sub2".
Panos
This is the Materialized Path way... very handy indeed
Camilo Díaz
A: 

This sounds like an exercise from "SQL For Smarties" by Joe Celko...

I don't have my copy handy, but I think it's a book that'll help you quite a bit if this is the kind of problems you need to solve.

Mike G.
+3  A: 

The Joe Celko's method which is similar to the njreed's answer but is more generic can be found here:

Panos
A: 

Thanks a lot for those answers! I actually do possess Joe Celko's "SQL for Smarties" (though not the one about hierarchical structures, so thanks, Panos, for those links) and even consulted it before posting my question. Unfortunately he doesn't give advice on how to actually clone hierachical data. Clone, as in, "See, I got this unbalanced tree with lots of subnodes and subsubnodes. I need another one just like that." I see how using the enumerate path model or the nested sets approach give me effective ways to select and manipulate trees in various ways, but i haven't quite figured out how to do cloning of branches or entire trees...

+1  A: 

@Maximilian: You are right, we forgot your actual requirement. How about a recursive stored procedure? I am not sure if this is possible in PostgreSQL, but here is a working SQL Server version:

CREATE PROCEDURE CloneNode
    @to_clone_id int, @parent_id int
AS
    SET NOCOUNT ON
    DECLARE @new_node_id int, @child_id int

    INSERT INTO test (name, parent_id) 
     SELECT name, @parent_id FROM test WHERE id = @to_clone_id
    SET @new_node_id = @@IDENTITY

    DECLARE @children_cursor CURSOR
    SET @children_cursor = CURSOR FOR 
     SELECT id FROM test WHERE parent_id = @to_clone_id

    OPEN @children_cursor
    FETCH NEXT FROM @children_cursor INTO @child_id
    WHILE @@FETCH_STATUS = 0
    BEGIN
     EXECUTE CloneNode @child_id, @new_node_id
     FETCH NEXT FROM @children_cursor INTO @child_id
    END
    CLOSE @children_cursor
    DEALLOCATE @children_cursor

Your example is accomplished by EXECUTE CloneNode 2, null (the second parameter is the new parent node).

Panos
A: 

Looks cool, I'll give that a try. Unfortunately I lack rep points to vote up your answer (yet...).

Thanks!

Maximilian Tyrtania