views:

94

answers:

6

I think I've got the format of Recursive CTEs down well enough to write one, but still find myself frustrated to no end that I cannot manually process one (pretend to be the SQL engine myself and reach the result set with pen and paper). I've found this, which is close to what I'm looking for, but not detailed enough. I have no problem tracing through a C++ recursive function and understanding how it runs -- but for SQL I don't understand why or how the engine knows to stop. Does the anchor and recursive block get called everytime, or is the anchor skipped in later iterations? (I doubt it but I'm trying to expression my confusion about the way it seems to jump around.) If the anchor is called each time, how does the anchor not appear multiple times in the final result? I hope someone can just do a break down line 1 line 2, etc. what happens and what is "in memory" as the result set accumulates.

I've taken the liberty of stealing my example from this page, since it seems to be the easiest to understand.

    DECLARE @tbl TABLE ( 
     Id INT 
    ,[Name] VARCHAR(20) 
    ,ParentId INT 
    ) 

INSERT INTO @tbl( Id, Name, ParentId ) 
VALUES 
 (1, 'Europe', NULL) 
,(2, 'Asia',   NULL) 
,(3, 'Germany', 1) 
,(4, 'UK',      1) 
,(5, 'China',   2) 
,(6, 'India',   2) 
,(7, 'Scotland', 4) 
,(8, 'Edinburgh', 7) 
,(9, 'Leith', 8) 

; 

WITH  abcd 
    AS ( 
          -- anchor 
        SELECT  id, [Name], ParentID, 
                CAST(([Name]) AS VARCHAR(1000)) AS "Path" 
        FROM    @tbl 
        WHERE   ParentId IS NULL 
        UNION ALL 
          --recursive member 
        SELECT  t.id, t.[Name], t.ParentID, 
                CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) AS "Path" 
        FROM    @tbl AS t 
                JOIN abcd AS a 
                  ON t.ParentId = a.id 
       )
SELECT * FROM abcd 
+1  A: 

To answer part of your question

The termination check is implicit; recursion stops when no rows are returned from the previous invocation.

Source: http://msdn.microsoft.com/en-us/library/ms186243.aspx

Edit: Actually That link also has a line by line breakdown

Pseudocode and Semantics

The recursive CTE structure must contain at least one anchor member and one recursive member. The following pseudocode shows the components of a simple recursive CTE that contains a single anchor member and single recursive member.

WITH cte_name ( column_name [,...n] )

AS

(

CTE_query_definition –- Anchor member is defined.

UNION ALL

CTE_query_definition –- Recursive member is defined referencing cte_name.

)

-- Statement using the CTE

SELECT *

FROM cte_name

The semantics of the recursive execution is as follows:

  1. Split the CTE expression into anchor and recursive members.
  2. Run the anchor member(s) creating the first invocation or base result set (T0).
  3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
  4. Repeat step 3 until an empty set is returned.
  5. Return the result set. This is a UNION ALL of T0 to Tn.
Martin Smith
+3  A: 

Think of a recursive CTE as of an endless UNION ALL:

WITH    rows AS
        (
        SELECT  *
        FROM    mytable
        WHERE   anchor_condition
        ),
        rows2 AS
        (
        SELECT  *
        FROM    set_operation(mytable, rows)
        ),
        rows3 AS
        (
        SELECT  *
        FROM    set_operation(mytable, rows2)
        ),
        …
SELECT  *
FROM    rows
UNION ALL
SELECT  *
FROM    rows2
UNION ALL
SELECT  *
FROM    rows3
UNION ALL
…

In your case, that would be:

WITH    abcd1 AS
        ( 
        SELECT  *
        FROM    @tbl t
        WHERE   ParentId IS NULL 
        ),
        abcd2 AS
        ( 
        SELECT  t.*
        FROM    abcd1
        JOIN    @tbl t
        ON      t.ParentID = abcd1.id
        ),
        abcd3 AS
        ( 
        SELECT  t.*
        FROM    abcd2
        JOIN    @tbl t
        ON      t.ParentID = abcd2.id
        ),
        abcd4 AS
        ( 
        SELECT  t.*
        FROM    abcd3
        JOIN    @tbl t
        ON      t.ParentID = abcd3.id
        ),
        abcd5 AS
        ( 
        SELECT  t.*
        FROM    abcd4
        JOIN    @tbl t
        ON      t.ParentID = abcd4.id
        ),
        abcd6 AS
        ( 
        SELECT  t.*
        FROM    abcd5
        JOIN    @tbl t
        ON      t.ParentID = abcd5.id
        )
SELECT  *
FROM    abcd1
UNION ALL
SELECT  *
FROM    abcd2
UNION ALL
SELECT  *
FROM    abcd3
UNION ALL
SELECT  *
FROM    abcd4
UNION ALL
SELECT  *
FROM    abcd5
UNION ALL
SELECT  *
FROM    abcd6

Since abcd6 yields no results, this implies a stopping condition.

Theoretically, a recursive CTE can be infinite, but practically, SQL Server tries to forbid the queries that would lead to infinite recordsets.

You may want to read this article:

Quassnoi
Lovely Explanation Quassnoi !
Baaju
+1  A: 

You were probably wanting this link. No, the anchor is not executed multiple times (it couldn't be, then union all would require that all of the results appear). Details at previous link.

Donnie
+1  A: 

I think it breaks down like this:

  1. The anchor statement is executed. This gives you a set of results, called the base set, or T0.

  2. The recursive statement is executed, using T0 as the table to execute the query against. This happens automatically when you query a CTE.

  3. If the recursive member returns some results, it creates a new set, T1. The recursive member is then executed again, using T1 as input, creating T2 if there are any results.

  4. Step 3 continues until no more results are generated, OR the maximum number of recursions has been met, as set by the MAX_RECURSION option.

This page probably explains it best. It has a step-by-step walkthrough of the execution path of a CTE.

womp
That's 3 of us linked to that article now :-)
Martin Smith
+1  A: 

Step 1:

1 Europe NULL Europe
2 Asia NULL Asia

Step 2:

1 Europe NULL Europe
2 Asia NULL Asia
3 Germany 1 Europe/Germany
4 UK 1 Europe/UK
5 China 2 Asia/China
6 India 2 Asia/India

Step 3:

1 Europe NULL Europe
2 Asia NULL Asia
3 Germany 1 Europe/Germany
4 UK 1 Europe/UK
5 China 2 Asia/China
6 India 2 Asia/India
7 Scotland 4 Europe/UK/Scotland

Step 4:

1 Europe NULL Europe
2 Asia NULL Asia
3 Germany 1 Europe/Germany
4 UK 1 Europe/UK
5 China 2 Asia/China
6 India 2 Asia/India
7 Scotland 4 Europe/UK/Scotland
8 Edinburgh 7 Europe/UK/Scotland/Edinburgh

Step 5:

1 Europe NULL Europe 0
2 Asia NULL Asia 0
3 Germany 1 Europe/Germany 1
4 UK 1 Europe/UK 1
5 China 2 Asia/China 1
6 India 2 Asia/India 1
7 Scotland 4 Europe/UK/Scotland 2
8 Edinburgh 7 Europe/UK/Scotland/Edinburgh 3
9 Leith 8 Europe/UK/Scotland/Edinburgh/Leith 4

The last column in step 5 is the Level. During each level the rows get added with respect to what is already available. Hope this helps.

Baaju
+2  A: 

Hi,

The algorithm that CTE use is:

  1. execute the anchor part, get result r0

  2. execute the recursive part, using r0 as input, and output r1

  3. execute the recursive part, using r1 as input, and output r2

  4. execute the recursive part, using r3 as input, and output r3

    ...

  5. execute the recursive part, using r(n-1) as input, and output rn, once rn is null, use UNION ALL to combine r0, r1, r2 ... r(n-1) as the final result

Lets take an example:

WITH    cte ( value )
          AS (
               SELECT   1
               UNION ALL
               SELECT   value + 1
               FROM     cte
               WHERE    value < 4
             )
    SELECT  *
    FROM    cte

The result of this query is:

value
-----------
1
2
3
4

(4 row(s) affected)

Let's examine it step by step:

anchor result
     1            : r0
 as input to
     |
     |
     V
     2
 as input to      : r1
     |
     |
     V
     3
 as input to      : r2
     |
     |
     V
     4
 as input to      : r3
     |
     |
     V
   NULL


r0
UNION ALL
r1
UNION ALL       ------------>        1, 2, 3, 4
r2
UNION ALL
r3

Now we use UNION ALL to combine these result(r0, r1, r2, r3) to get the final result.

Yousui