views:

4548

answers:

7

Simpler Example

Let's try a simpler example, so people can wrap their heads around the concepts, and have a practical example that you can copy&paste into SQL Query Analizer:

Imagine a Nodes table, with a heirarchy:

A
 - B
    - C

We can start testing in Query Analizer:

CREATE TABLE ##Nodes
(
 NodeID varchar(50) PRIMARY KEY NOT NULL,
 ParentNodeID varchar(50) NULL
)

INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', null)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A')
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B')

Desired output:

ParentNodeID    NodeID    GenerationsRemoved
============    ======    ==================
NULL            A         1
NULL            B         2
NULL            C         3
A               B         1
A               C         2
B               C         1

Now the suggested CTE expression, with it's incorrect output:

WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM ##Nodes
   WHERE ParentNodeID IS NULL

   UNION ALL

   --recursive execution
   SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
   FROM NodeChildren AS P
      INNER JOIN ##Nodes AS N
      ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren

Actual output:

ParentNodeID    NodeID    GenerationsRemoved
============    ======    ==================
NULL            A         1
NULL            B         2
NULL            C         3

Note: If SQL Server 2005† CTE cannot do what i was doing before in 2000‡, that's fine, and that's the answer. And whoever gives "it's not possible" as the answer will win the bounty. But i will wait a few days to make sure everyone concur's that it's not possible before i irrecovably give 250 reputation for a non-solution to my problem.

Nitpickers Corner

†not 2008

‡without resorting to a UDF*, which is the solution already have

*unless you can see a way to improve the performance of the UDF in the original question


Original Question

i have a table of Nodes, each with a parent that points to another Node (or to null).

For illustration:

1 My Computer
    2 Drive C
         4 Users
         5 Program Files
         7 Windows
             8 System32
    3 Drive D
         6 mp3

i want a table that returns all the parent-child relationships, and the number of generations between them

For for all direct parent relationships:

ParentNodeID  ChildNodeID  GenerationsRemoved
============  ===========  ===================
(null)        1            1
1             2            1
2             4            1
2             5            1
2             7            1
1             3            1
3             6            1
7             8            1

But then there's the grandparent relationships:

ParentNodeID  ChildNodeID  GenerationsRemoved
============  ===========  ===================
(null)        2            2
(null)        3            2
1             4            2
1             5            2
1             7            2
1             6            2
2             8            2

And the there's the great-grand-grandparent relationships:

ParentNodeID  ChildNodeID  GenerationsRemoved
============  ===========  ===================
(null)        4            3
(null)        5            3
(null)        7            3
(null)        6            3
1             8            3

So i can figure out the basic CTE initialization:

WITH (NodeChildren) AS
{
   --initialization
   SELECT ParentNodeID, NodeID AS ChildNodeID, 1 AS GenerationsRemoved
   FROM Nodes
}

The problem now is the recursive part. The obvious answer, of course, doesn't work:

WITH (NodeChildren) AS
{
   --initialization
   SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
   FROM Nodes

   UNION ALL

   --recursive execution
   SELECT parents.ParentNodeID, children.NodeID, parents.Generations+1
   FROM NodeChildren parents
    INNER JOIN NodeParents children
    ON parents.NodeID = children.ParentNodeID
} 

Msg 253, Level 16, State 1, Line 1
Recursive member of a common table expression 'NodeChildren' has multiple recursive references.

All the information needed to generate the entire recursive list is present in the inital CTE table. But if that's not allowed i'll try:

WITH (NodeChildren) AS
{
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM Nodes

   UNION ALL

   --recursive execution
   SELECT parents.ParentNodeID, Nodes.NodeID, parents.Generations+1
   FROM NodeChildren parents
    INNER JOIN Nodes
    ON parents.NodeID = nodes.ParentNodeID
}

But that fails because it's not only joining on the recursive elements, but keeps recursivly adding the same rows over and over:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

In SQL Server 2000 i simulated a CTE by using a User Defined Function (UDF):

CREATE FUNCTION [dbo].[fn_NodeChildren] ()
RETURNS @Result TABLE (
    ParentNodeID int NULL,
    ChildNodeID int NULL,
    Generations int NOT NULL) 
AS  
/*This UDF returns all "ParentNode" - "Child Node" combinations
    ...even multiple levels separated
BEGIN 
    DECLARE @Generations int
    SET @Generations = 1

    --Insert into the Return table all "Self" entries
    INSERT INTO @Result
    SELECT ParentNodeID, NodeID, @Generations
    FROM Nodes
    WHILE @@rowcount > 0 
    BEGIN
     SET @Generations = @Generations + 1
     --Add to the Children table: 
     -- children of all nodes just added 
     -- (i.e. Where @Result.Generation = CurrentGeneration-1)
     INSERT @Result
     SELECT CurrentParents.ParentNodeID, Nodes.NodeID, @Generations
     FROM Nodes
      INNER JOIN @Result CurrentParents
      ON Nodes.ParentNodeID = CurrentParents.ChildNodeID
     WHERE CurrentParents.Generations = @Generations - 1
    END
    RETURN
END

And the magic to keep it from blowing up was the limiting where clause: WHERE CurrentParents.Generations - @Generations-1

How do you prevent a recursive CTE from recursing forever?

A: 

Aside: do you have SQL Server 2008? This might be suited to the hierarchyid data type.

Marc Gravell
No, don't have 2008. And in 2003 when i originally need the problem solved, people said to wait for 2005.
Ian Boyd
Additionally, i can't get ahold of 2008, intall it, port the database to 2008, redesign the entire system, get the customer to buy, install and configured 2008 - all in the next few hours.
Ian Boyd
A: 

If I understand your intentions you can get you result by doing something like this:

DECLARE @StartID INT;
SET @StartID = 1;
WITH CTE (ChildNodeID, ParentNodeID, [Level]) AS
(
  SELECT  t1.ChildNodeID, 
          t1.ParentNodeID, 
          0
  FROM tblNodes AS t1
  WHERE ChildNodeID = @StartID
  UNION ALL
  SELECT  t1.ChildNodeID, 
          t1.ParentNodeID, 
          t2.[Level]+1
  FROM tblNodes AS t1
    INNER JOIN CTE AS t2 ON t1.ParentNodeID = t2.ChildNodeID    
)
SELECT t1.ChildNodeID, t2.ChildNodeID, t1.[Level]- t2.[Level] AS GenerationsDiff
FROM CTE AS t1
  CROSS APPLY CTE t2

This will return the generation difference between all nodes, you can modify it for you exact needs.

Joakim Backman
That doesn't work because it only starts from a specific node - i need all nodes. Then all those nodes children. Then all those node's children. etc.
Ian Boyd
A: 

Well, your answer is not quite that obvious :-)

WITH (NodeChildren) AS
{
   --initialization
   SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
   FROM Nodes

This part is called the "anchor" part of the recursive CTE - but it should really only select one or a select few rows from your table - this selects everything!

I guess what you're missing here is simply a suitable WHERE clause:

WITH (NodeChildren) AS
{
   --initialization
   SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
   FROM Nodes
   **WHERE ParentNodeID IS NULL**

However, I am afraid your requirement to have not just the "straight" hierarchy, but also the grandparent-child rows, might not be that easy to satisfy.... normally recursive CTE will only ever show one level and its direct subordinates (and that down the hierarchy, of course) - it doesn't usually skip one, two or even more levels.

Hope this helps a bit.

Marc

marc_s
i can't limit it to rows with no parent, because then it limits to rows with no parent. i need to include rows with a parent also.
Ian Boyd
+6  A: 

Try this:

WITH Nodes AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM ##Nodes

   UNION ALL

   ----recursive execution
   SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
   FROM Nodes AS P
   INNER JOIN ##Nodes AS N
   ON P.NodeID = N.ParentNodeID
   WHERE P.GenerationsRemoved <= 10

)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM Nodes
ORDER BY ParentNodeID, NodeID, GenerationsRemoved

Basically removing the "only show me absolute parents" from the initialization query; That way it generates the results starting from each of them and decending from there. I also added in the "WHERE P.GenerationsRemoved <= 10" as an infinite recursion catch(replace 10 with any number up to 100 to fit your needs). Then add the sort so it looks like the results you wanted.

Chris Shaffer
Brilliant solution! If you still get the max recursion error with this it's because you have cyclic references in your hierarchy.
Richard Poole
You can limit recursions using OPTION (MAXRECURSION nn) correctly. This is not a correct solution
gbn
OPTION(MAXRECURSION nn) causes the statement to fail; My goal with including the WHERE clause was to guarantee a return (obviously there will be times that a failure is better).
Chris Shaffer
A: 

Please can you give us actual nodes schema and data? Or explain what I've missed? Simply put, your code works as far as I can see and test

You aren't recursing forever, just more then 100 times (the default). If you have 1000s of rows in your data table, then this is probably the cause.

When you get to the bottom, I think you simply need OPTION (MAXRECURSION 0) in real life. Zero means forever, otherwise 1-32767

Below this is all cut'n'paste ready into SQL tools.

--Same table and function as above
CREATE TABLE dbo.Nodes
(
 ParentNodeID varchar(50) NULL,
 NodeID varchar(50) PRIMARY KEY NOT NULL
)
GO
CREATE FUNCTION [dbo].[fn_NodeChildren] ()
RETURNS @Result TABLE (
    ParentNodeID int NULL,
    ChildNodeID int NULL,
    Generations int NOT NULL) 
AS  
BEGIN 
    DECLARE @Generations int
    SET @Generations = 1

    --Insert into the Return table all "Self" entries
    INSERT INTO @Result
    SELECT ParentNodeID, NodeID, @Generations
    FROM dbo.Nodes
    WHILE @@rowcount > 0 
    BEGIN
        SET @Generations = @Generations + 1
        --Add to the Children table: 
        --      children of all nodes just added 
        -- (i.e. Where @Result.Generation = CurrentGeneration-1)
        INSERT @Result
        SELECT CurrentParents.ParentNodeID, dbo.Nodes.NodeID, @Generations
        FROM dbo.Nodes
                INNER JOIN @Result CurrentParents
                ON dbo.Nodes.ParentNodeID = CurrentParents.ChildNodeID
        WHERE CurrentParents.Generations = @Generations - 1
    END
    RETURN
END
GO

--Fill with "real" values
INSERT INTO dbo.Nodes VALUES (null, 1)
INSERT INTO dbo.Nodes VALUES (1, 2)
INSERT INTO dbo.Nodes VALUES (2, 4)
INSERT INTO dbo.Nodes VALUES (2, 5)
INSERT INTO dbo.Nodes VALUES (2, 7)
INSERT INTO dbo.Nodes VALUES (1, 3)
INSERT INTO dbo.Nodes VALUES (3, 6)
INSERT INTO dbo.Nodes VALUES (7, 8 )
GO

--Cut and Paste OP's overflow CTE = 21 rows.
--OP's parent/granny/great-granny examples have 20 rows, but we have a single great-great-granny
WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM dbo.Nodes

   UNION ALL

   --recursive execution
   SELECT parents.ParentNodeID, dbo.Nodes.NodeID, parents.GenerationsRemoved+1
   FROM NodeChildren parents
    INNER JOIN dbo.Nodes
    ON parents.NodeID = dbo.Nodes.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
ORDER BY
GenerationsRemoved, ParentNodeID, NodeID
GO

--Now call the SQL 2000 UDF. The sort helps compare the 21 rows
SELECT ParentNodeID, ChildNodeID, Generations FROM [dbo].[fn_NodeChildren]()
ORDER BY
Generations, ParentNodeID, ChildNodeID
GO

--Now difference the two sets in case eyeballs fail
WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM dbo.Nodes

   UNION ALL

   --recursive execution
   SELECT parents.ParentNodeID, dbo.Nodes.NodeID, parents.GenerationsRemoved+1
   FROM NodeChildren parents
    INNER JOIN dbo.Nodes
    ON parents.NodeID = dbo.Nodes.ParentNodeID
)
SELECT
    *
FROm
    (SELECT ParentNodeID, NodeID, GenerationsRemoved FROM NodeChildren) CTE
    FULL OUTER JOIN
    (SELECT ParentNodeID, ChildNodeID, Generations FROM [dbo].[fn_NodeChildren]()) FN ON CTE.GenerationsRemoved = FN.Generations AND CTE.NodeID = FN.ChildNodeID
WHERE
    (CTE.NodeID IS NULL OR FN.ChildNodeID IS NULL)
--    AND
--    NOT (CTE.ParentNodeID IS NOT NULL AND FN.ParentNodeID IS NOT NULL)
GO

--Repeat with first A, B, C example but use ASCII values so FN works as-is
DROP TABLE dbo.Nodes
GO
CREATE TABLE dbo.Nodes
(
 ParentNodeID varchar(50) NULL,
 NodeID varchar(50) PRIMARY KEY NOT NULL
)
GO
INSERT INTO dbo.Nodes VALUES (null, 41)
INSERT INTO dbo.Nodes VALUES (41, 42)
INSERT INTO dbo.Nodes VALUES (42, 43)
GO

--6 rows from CTE, different sort to match example
WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM dbo.Nodes

   UNION ALL

   --recursive execution
   SELECT parents.ParentNodeID, dbo.Nodes.NodeID, parents.GenerationsRemoved+1
   FROM NodeChildren parents
    INNER JOIN dbo.Nodes
    ON parents.NodeID = dbo.Nodes.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
ORDER BY
ParentNodeID, NodeID
GO

--Exact same 6 rows from FN
SELECT ParentNodeID, ChildNodeID, Generations FROM [dbo].[fn_NodeChildren]()
ORDER BY
ParentNodeID, ChildNodeID
GO

--Retry 1st CTE mentioned without IS NULL. Gives expected results
WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM dbo.Nodes
   --WHERE ParentNodeID IS NULL

   UNION ALL

   --recursive execution
   SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
   FROM NodeChildren AS P
      INNER JOIN dbo.Nodes AS N
      ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren

--Tidy up
DROP FUNCTION [dbo].[fn_NodeChildren] 
GO
DROP TABLE dbo.Nodes
GO

--Discuss
gbn
A: 

The issue is with the Sql Server default recursion limit (100). If you try your example at the top with the anchor restriction removed (also added Order By):

WITH NodeChildren AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM Nodes

   UNION ALL

   --recursive execution
   SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
   FROM NodeChildren AS P
      inner JOIN Nodes AS N
      ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
ORDER BY ParentNodeID ASC

This produces the desired results. The problem you aare facing is with a larger number of rows you will recirse over 100 times which is a default limit. This can be changed by adding option (max recursion x) after your query, where x is a number between 1 and 32767. x can also be set to 0 which sets no limit however could very quickly have a very detrimental impact on your server performance. Clearly as the number of rows in Nodes increases, the number of recursions will can rise very quickly and I would avoid this approach unless there was a known upper limit on the rows in the table. For completeness, the final query should look like:

 WITH NodeChildren AS
    (
       --initialization
       SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
       FROM Nodes

       UNION ALL

       --recursive execution
       SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
       FROM NodeChildren AS P
          inner JOIN Nodes AS N
          ON P.NodeID = N.ParentNodeID
    )
    SELECT * 
    FROM NodeChildren
    ORDER BY ParentNodeID
    OPTION (MAXRECURSION 32767)

Where 32767 could be adjusted downwards to fit your scenario

Macros
A: 

Have you tried constructing a path in the CTE and using it to identify ancestors?

You can then subtract the descendant node depth from the ancestor node depth to calculate the GenerationsRemoved column, like so...

DECLARE @Nodes TABLE
(
    NodeId varchar(50) PRIMARY KEY NOT NULL,
    ParentNodeId varchar(50) NULL
)

INSERT INTO @Nodes (NodeId, ParentNodeId) VALUES ('A', NULL)
INSERT INTO @Nodes (NodeId, ParentNodeId) VALUES ('B', 'A')
INSERT INTO @Nodes (NodeId, ParentNodeId) VALUES ('C', 'B')

DECLARE @Hierarchy TABLE
(
    NodeId varchar(50) PRIMARY KEY NOT NULL,
    ParentNodeId varchar(50) NULL,
    Depth int NOT NULL,
    [Path] varchar(2000) NOT NULL
)

WITH Hierarchy AS
(
    --initialization
    SELECT NodeId, ParentNodeId, 0 AS Depth, CONVERT(varchar(2000), NodeId) AS [Path]
    FROM @Nodes
    WHERE ParentNodeId IS NULL

    UNION ALL

    --recursive execution
    SELECT n.NodeId, n.ParentNodeId, p.Depth + 1, CONVERT(varchar(2000), p.[Path] + '/' + n.NodeId)
    FROM Hierarchy AS p
    INNER JOIN @Nodes AS n
    ON p.NodeId = n.ParentNodeId
)
INSERT INTO @Hierarchy
SELECT *
FROM Hierarchy

SELECT parent.NodeId AS AncestorNodeId, child.NodeId AS DescendantNodeId, child.Depth - parent.Depth AS GenerationsRemoved
FROM @Hierarchy AS parent
INNER JOIN @Hierarchy AS child
ON child.[Path] LIKE parent.[Path] + '/%'
Richard Poole