views:

1474

answers:

9

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.


Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:

with
UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl


Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.

A: 

basicaly if you have loops like this in data you'll have to do the retreival logic by yourself. you could use one cte to get only subordinates and other to get bosses.

another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.

Mladen Prajdic
Fixing data is an option, but I'd rather leave that as the last option (btw - it's also possible for a person NOT to have a boss, so I don't even have to create a dummy user).
Vilx-
Also, there are other loops there I think.
Vilx-
the only way then is to create your own logic.
Mladen Prajdic
+1  A: 

this will work for the initial recursive link, but might not work for longer links

DECLARE @Table TABLE(
     ID INT,
     PARENTID INT
)

INSERT INTO @Table (ID,PARENTID) SELECT 1, 2

INSERT INTO @Table (ID,PARENTID) SELECT 2, 1

INSERT INTO @Table (ID,PARENTID) SELECT 3, 1

INSERT INTO @Table (ID,PARENTID) SELECT 4, 3

INSERT INTO @Table (ID,PARENTID) SELECT 5, 2


SELECT * FROM @Table

DECLARE @ID INT

SELECT @ID = 1

;WITH boss (ID,PARENTID) AS (
    SELECT ID,
      PARENTID
    FROM @Table
    WHERE PARENTID = @ID
),
 bossChild (ID,PARENTID) AS (
    SELECT ID,
      PARENTID
    FROM boss
    UNION ALL
    SELECT t.ID,
      t.PARENTID
    FROM @Table t INNER JOIN
      bossChild b ON t.PARENTID = b.ID
    WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT  *
FROM    bossChild
OPTION (MAXRECURSION 0)

what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.

astander
A: 

Not a generic solution, but might work for your case: in your select query modify this:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])

to become:

select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID]) 
   and a.[User_ID] <> @UserID
van
Heh, nice idea! Indeed, I believe that would work! :)
Vilx-
If employee may have at max 1 manager, then it should work fine. If not, you may get loops on lower levels, which will not stop the recursion.
van
A: 

The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.

However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One

You can add a NOT IN() clause in the join to filter out the cycles.

AlexKuznetsov
True, and this was how I solved it in the end, but the question is still valid, and for some configurations this might be what is needed anyway (for example, if you want to select all the nodes in a graph that can be reached from a specific start node).
Vilx-
A: 

You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.

The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.

with UserTbl as -- Selects an employee and his subordinates.
(
    select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
    union all
    (
      select a.[User_ID], a.[Manager_ID] 
        from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
        where a.[User_ID] not in (select [User_ID] from UserTbl)
      EXCEPT
        select a.[User_ID], a.[Manager_ID] from UserTbl a 
     )
)
select * from UserTbl;

The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.

Matijs
This will not work: multiple recursive references are not allowed in CTE.
van
It is worse than that I see, you are not even allowed to use sub-queries in a CTE.
Matijs
I think I solved it now.
Matijs
A: 

I can think of two approaches.

1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.

2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.

Approach 1:

; with TooMuchHierarchy as (
    select "User_ID"
     , Manager_ID 
     , 0 as Depth
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
     , U.Manager_ID
     , M.Depth + 1 as Depth
    from TooMuchHierarchy M
    inner join "User" U 
     on U.Manager_ID = M."user_id"
    where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
    select "User_ID"
     , Manager_id
     , Depth
     , max(depth) over (partition by "User_ID") as MaxDepth
    from TooMuchHierarchy)
select "user_id", Manager_Id 
from AddMaxDepth
where Depth = MaxDepth

The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.

Approach 2:

; with Hierarchy as (
    select "User_ID"
     , Manager_ID 
     , '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
    from "User" 
    WHERE "User_ID" = @UserID
    union all
    select U."User_ID"
     , U.Manager_ID
     , M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
    from Hierarchy M
    inner join "User" U 
     on U.Manager_ID = M."user_id"
    where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id 
from Hierarchy
Shannon Severance
A: 

You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.

The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when @@rowcount=0! Simple eh?

Phil Factor
A: 

This is the code I used on a project to chase up and down hierarchical relationship trees.

User defined function to capture subordinates:

CREATE FUNCTION fn_UserSubordinates(@User_ID INT)
RETURNS @SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
     RETURN

    INSERT INTO @SubordinateUsers (User_ID, Distance) VALUES ( @User_ID, 0)

    DECLARE @Distance INT, @Finished BIT
    SELECT @Distance = 1, @Finished = 0

    WHILE @Finished = 0
    BEGIN
     INSERT INTO @SubordinateUsers
      SELECT S.User_ID, @Distance
       FROM Users AS S
       JOIN @SubordinateUsers AS C
        ON C.User_ID = S.Manager_ID
       LEFT JOIN @SubordinateUsers AS C2
        ON C2.User_ID = S.User_ID
       WHERE C2.User_ID IS NULL
     IF @@RowCount = 0
      SET @Finished = 1

     SET @Distance = @Distance + 1
    END

    RETURN
END

User defined function to capture managers:

CREATE FUNCTION fn_UserManagers(@User_ID INT)
RETURNS @User TABLE (User_ID INT, Distance INT) AS BEGIN
    IF @User_ID IS NULL
     RETURN

    DECLARE @Manager_ID INT

    SELECT @Manager_ID = Manager_ID
    FROM UserClasses WITH (NOLOCK)
    WHERE User_ID = @User_ID

    INSERT INTO @UserClasses (User_ID, Distance)
     SELECT User_ID, Distance + 1
     FROM dbo.fn_UserManagers(@Manager_ID)

    INSERT INTO @User (User_ID, Distance) VALUES (@User_ID, 0)

    RETURN
END
CptSkippy
+1  A: 

Hi John -

I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.

Jose

DECLARE @Table TABLE(
    USER_ID INT,
    MANAGER_ID INT )
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO @Table (USER_ID,MANAGER_ID) SELECT 5, 2

DECLARE @UserID INT
SELECT @UserID = 1

;with
UserTbl as -- Selects an employee and his subordinates.
(
    select 
        '/'+cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    where [User_ID] = @UserID
    union all
    select
        b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
        a.[User_ID], 
        a.[Manager_ID] 
    from @Table a 
    inner join UserTbl b 
        on (a.[Manager_ID]=b.[User_ID])
    where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
Jose Chama
Nice one! Probably not very speedy, but still nice! :) Oh, and btw - it wasn't john who asked this question. :)
Vilx-
Hmm... I think this has a trivial error in the case that the duplicate user ID is at the end of the path. The final user ID in the path doesn't contain a trailing slash, but the charindex() requires it.
Vilx-
Hi Vilx, Sorry for calling you John but I got confused by the edit that John did.The only time that you will have the scenario that you described is when a User ID is pointing to itself, and in that case it will show you the relationship only once without entering in an infinite loop. Sometimes bugs become features, everything depends on the eye of the beholder. I used this solution but the ids are uniqueidentifiers and I do not need to use the trailing slash, but with numbers is a different story. Regards. :)
Jose Chama