views:

139

answers:

4

Consider the following rows in a database:

Id      |   Parent
__________________
1           null
2           1
3           2
4           3
5           null
6           5

Each Id that has a null Parent is the "Owner"/"Super Parent".

What would be the best approach, performance wise, to collect the parents and their children ? Should I use LINQ or Stored Procedures ?

I want the end result to be IEnumerable<IEnumerable<int>>.

+5  A: 

In SQL, you could query using a CTE. For example, to retrieve a list of nodes with their parent and the highest parent in their tree:

declare @t table (id int, parent int)
insert @t (id, parent) values (1, null), (2,1), (3,2), (4,3), (5,null), (6,5)

; with cte as (
    select  id, parent, id as head
    from    @t
    where   parent is null
    union all
    select  child.id, child.parent, parent.head
    from    @t child
    join    cte parent
    on      parent.id = child.parent
)
select  *
from    cte

This gives:

id  parent  head
1   NULL    1
2   1       1
3   2       1
4   3       1
5   NULL    5
6   5       5

Note that I changed your example data so row 2 is no longer a child of itself, but a child of row 1.

Andomar
+1  A: 

If the table is not too big your best bet would be to return the entire table by doing this db.Categories

Once the entire categories table is fetched into Entity Framework, EF will use relationship span to fix the objectgraph so when you do category.SubCategories you would get all the children. The advantage of this is, you sql wouldn't be complex because it would be basically select * from categories. Much of the hard work would be done by EF in fixing the object graph so that all the children align correctly with their parent.

You can also use what someone else mentioned about using common table expression.

I cover two such concepts in my book.

5-11 Using Relationship Span 5-2 Loading a Complete Object Graph(CTE)

zeeshanhirani
+2  A: 

You may also use a pure SQL solution; here is a sample for SQL Server. It wouldn't be difficult to re-write it for a different DB manager:

/* Create table */
CREATE TABLE dbo.Nodes (ID int NOT NULL PRIMARY KEY, Parent int)

/* Insert sample data */
INSERT INTO Nodes VALUES (1,NULL)
INSERT INTO Nodes VALUES (2,1)
INSERT INTO Nodes VALUES (3,2)
INSERT INTO Nodes VALUES (4,3)
INSERT INTO Nodes VALUES (5,NULL)
INSERT INTO Nodes VALUES (6,5)

/* Create recursive function */
CREATE function dbo.fn_Root(@ID int) returns int
AS BEGIN
   DECLARE @R int

   SELECT @R = CASE WHEN Parent IS NULL THEN ID
                    ELSE dbo.fn_Root(Parent)
                 END
            FROM Nodes
           WHERE id = @id

   RETURN @R
END

/* Query the table */
SELECT ID, Parent, dbo.fn_Root(ID) AS Root
  FROM Nodes

/* Also, in SQL Server you can create a calculated column */
ALTER TABLE Nodes ADD Root AS dbo.fn_Root(id)

This is the basic version. But this one would fail if your data had closed loops (not a tree structure). To prevent code from entering in an endless loop, the function could be improved like this:

CREATE function dbo.fn_Root(@ID int, @Initial int) returns int
AS BEGIN
   DECLARE @R int

   DECLARE @Parent int
   SELECT @Parent = Parent FROM Nodes WHERE ID = @ID

   IF @Parent IS NULL 
      SELECT @R = @ID   /* No parent, the root is the node itself */
   ELSE
      IF @Parent = @Initial 
         /* We have returned to initial node: endless loop. We return NULL to indicate no root exists */
         SELECT @R = NULL
      ELSE
         /* The root will be the root of the parent node */
         SELECT @R = dbo.fn_Root(@Parent,@Initial)   

   RETURN @R

END

/* Query the table */
SELECT ID, Parent, dbo.fn_Root(ID,ID) AS Root FROM Nodes

With this modification, if function returns NULL, it indicates that the node is part of a loop, and so it has no root node.

ACB
A: 

Databases aren't really meant to do arbitrary depth recursion. Here's the desired operation done locally.

List<Item> items = context.Items.ToList();

Dictionary<int, Item> itemsById = items.ToDictionary(item => item.Id);

Dictionary<int, List<Item>> itemsByRoot = new Dictionary<int, List<Item>>();
List<Item> cyclicals = new List<Item>();

foreach(Item item in items)
{
  HashSet<int> seenIt = new HashSet<int>();
  Item parent = item;
  while (parent.ParentId != null && !seenIt[parent.Id])
  {
    seenIt.Add(parent.Id);
    parent = itemsById[parent.ParentId];
  }

  if (parent.ParentId == null)
  {
    if (!itemsByRoot.ContainsKey(parent.Id))
    {
      itemsByRoot[parent.Id] = new List<Item>();
    }
    itemsByRoot[parent.Id].Add(item);
  }
  else
  {
    cyclicals.Add(item);
  }
}
David B
Would there be any performance benefits from doing this instead of the accepted answer? And that feels way more complex than it has to be.
Filip Ekberg