views:

774

answers:

4

Oracle SQL can do hierarchical queries since v2, using their proprietary CONNECT BY syntax. In their latest 11g release 2, they added recursive subquery factoring, also known as the recursive with clause. This is the ANSI standard, and if I understand correctly, this one has been implemented by other RDBMS vendors as well.

When comparing the connect-by with the recursive with, I noticed a difference in the result set when using cycle detection. The connect by results are more intuitive to me, so I'm wondering if Oracle's implementation contains a bug, or if this is standard ANSI and expected behaviour. Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others. Provided those databases support the recursive with clause of course.

Here is how it works on Oracle 11.2.0.1.0

SQL> select *
  2    from t
  3  /

        ID  PARENT_ID
---------- ----------
         1          2
         2          1

2 rows selected.

The query using CONNECT BY syntax:

SQL>  select id
  2        , parent_id
  3        , connect_by_iscycle
  4     from t
  5  connect by nocycle parent_id = prior id
  6    start with id = 1
  7  /

        ID  PARENT_ID CONNECT_BY_ISCYCLE
---------- ---------- ------------------
         1          2                  0
         2          1                  1

2 rows selected.

Which looks intuitive to me. However, using the new ANSI syntax it returns one more row:

SQL> with tr (id,parent_id) as
  2  ( select id
  3         , parent_id
  4      from t
  5     where id = 1
  6     union all
  7    select t.id
  8         , t.parent_id
  9      from t
 10           join tr on t.parent_id = tr.id
 11  ) cycle id set is_cycle to '1' default '0'
 12  select id
 13       , parent_id
 14       , is_cycle
 15    from tr
 16  /

        ID  PARENT_ID I
---------- ---------- -
         1          2 0
         2          1 0
         1          2 1

3 rows selected.

This is the script you can use to check:

create table t
( id        number
, parent_id number
);
insert into t values (1, 2);
insert into t values (2, 1);
commit;
with tr (id,parent_id) as
( select id
       , parent_id
    from t
   where id = 1
   union all
  select t.id
       , t.parent_id
    from t
         join tr on t.parent_id = tr.id
) cycle id set is_cycle to '1' default '0'
select id
     , parent_id
     , is_cycle
  from tr;
+1  A: 

MySQL Server version 5.0.45 didn't like with:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'with tr (id, parent_id) as (select id, parent_id from t where id = 1 union all s' at line 1

wallyk
Thanks for trying, wallyk. But does MySQL support something similar and this just isn't the right syntax, or doesn't MySQL doesn't support recursive queries at all?
Rob van Wijk
I don't think it does. It was only a few years ago that it did not support stored procedures, so it has evolved rather quickly. I wouldn't be surprised if such things are in the latest version, or a upcoming version.
wallyk
+2  A: 

AFAIK:

  • MySQL doesn't support recursive CTE's
  • SQL Sever does not support cycle detection in recursive CTE's
Andomar
MySQL doesn't support CTEs at all.
OMG Ponies
That's clear, thanks. So MySQL and SQL Server cannot be used to test this script.
Rob van Wijk
Indeed, MS SQL Server has a maximum recursion limit, something like 100 by default.
Vilx-
+2  A: 

PostgreSQL supports WITH-style hierarchical queries, but doesn't have any automatic cycle detection. This means that you need to write your own and the number of rows returned depends on the way you specify join conditions in the recursive part of the query.

Both examples use an array if IDs (called all_ids) to detect loops:

WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)
    FROM t
    JOIN tr ON t.parent_id = tr.id AND NOT cycle)
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | f
  1 |         2 | t


WITH recursive tr (id, parent_id, all_ids, cycle) AS (
    SELECT id, parent_id, ARRAY[id], false
    FROM t
    WHERE id = 1
    UNION ALL
    SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))
    FROM t
    JOIN tr ON t.parent_id = tr.id
    WHERE NOT t.id = ANY(all_ids))
SELECT id, parent_id, cycle
FROM tr;

 id | parent_id | cycle
----+-----------+-------
  1 |         2 | f
  2 |         1 | t
Aleksander Kmetec
Very interesting queries. Oracle doesn't support that array syntax, but it's nice to see that the results are the same as in Oracle's ANSI syntax. It leads me to think that the ANSI query maybe doesn't contain a bug after all. Thanks, Aleksander.
Rob van Wijk
+3  A: 

From documentation on CONNECT_BY_ISCYCLE:

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor

and that on CYCLE:

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

In your example, row 2 does have a child which is also its ancestor, but its id has not been returned yet.

In other words, CONNECT_BY_ISCYCLE checks the children (which are yet to be returned), while CYCLE checks the current row (which is already returned).

CONNECT BY is row based, while recursive CTE's are set-based.

There is no concept of a "child" in a recursive CTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

Since recursive CTE's are usually used to build hierarchy trees, Oracle decided to add a cycle check. But due the set-based way the recursive CTE's operate, it's generally impossible to tell will the next step generate a cycle or not.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation. It's not a problem with a single row (like in CONNECT BY), but it is a problem with a set as a whole.

Didn't look into Oracle 11 yet, but SQL Server implements recursive CTE's by just hiding a CONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL's implementation, on the other hand, is truly set-based.

As was mentioned before, MySQL does not implement CTE's at all (it does not implement HASH JOIN's or MERGE JOINs as well, only the nested loops, so don't be surprised much).

Ironically, I received a letter today on this very subject, which I will cover in my blog.

Update:

Recursive CTE's in SQL Server are no more than CONNECT BY in disguise. See this article in my blog for shocking details:

Quassnoi
Good explanation, Quassnoi. Thanks. So the recursive with clause works as expected. It's just different than the connect-by. It's nice to be aware of these little nuances between the two.
Rob van Wijk