views:

5918

answers:

9

I have some hierarchical data - each entry has an id and a (nullable) parent entry id. I want to retrieve all entries in the tree under a given entry. This is in a SQL Server 2005 database. I am querying it with LINQ to SQL in C# 3.5.

LINQ to SQL does not support Common Table Expressions directly. My choices are to assemble the data in code with several LINQ queries, or to make a view on the database that surfaces a CTE.

Which option (or another option) do you think will perform better when data volumes get large? Is SQL Server 2008's HierarchyId type supported in Linq to SQL?

+2  A: 

In MS SQL 2008 you could use HierarchyID directly, in sql2005 you may have to implement them manually. ParentID is not that performant on large data sets. Also check this article for more discussion on the topic.

Ilya Ryzhenkov
There's no mention there if HierarchyID is usable in LINQ to SQL
Anthony
wow, this is definitely the answer. nice post!
Shawn Simon
it's not usable in linq2sql out of the box
Dmitri Nesteruk
+3  A: 

I would set up a view and an associated table-based function based on the CTE. My reasoning for this is that, while you could implement the logic on the application side, this would involve sending the intermediate data over the wire for computation in the application. Using the DBML designer, the view translates into a Table entity. You can then associate the function with the Table entity and invoke the method created on the DataContext to derive objects of the type defined by the view. Using the table-based function allows the query engine to take your parameters into account while constructing the result set rather than applying a condition on the result set defined by the view after the fact.

CREATE TABLE [dbo].[hierarchical_table](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [parent_id] [int] NULL,
    [data] [varchar](255) NOT NULL,
 CONSTRAINT [PK_hierarchical_table] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE VIEW [dbo].[vw_recursive_view]
AS
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT     id, parent_id, data, 0 AS lvl
      FROM         dbo.hierarchical_table
      WHERE     (parent_id IS NULL)
      UNION ALL
      SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
      FROM         dbo.hierarchical_table AS t1 INNER JOIN
                            hierarchy_cte AS h ON t1.parent_id = h.id)
SELECT     id, parent_id, data, lvl
FROM         hierarchy_cte AS result


CREATE FUNCTION [dbo].[fn_tree_for_parent] 
(
    @parent int
)
RETURNS 
@result TABLE 
(
    id int not null,
    parent_id int,
    data varchar(255) not null,
    lvl int not null
)
AS
BEGIN
    WITH hierarchy_cte(id, parent_id, data, lvl) AS
   (SELECT     id, parent_id, data, 0 AS lvl
     FROM         dbo.hierarchical_table
     WHERE     (id = @parent OR (parent_id IS NULL AND @parent IS NULL))
     UNION ALL
     SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
     FROM         dbo.hierarchical_table AS t1 INNER JOIN
      hierarchy_cte AS h ON t1.parent_id = h.id)
    INSERT INTO @result
    SELECT     id, parent_id, data, lvl
    FROM         hierarchy_cte AS result
RETURN 
END

ALTER TABLE [dbo].[hierarchical_table]  WITH CHECK ADD  CONSTRAINT [FK_hierarchical_table_hierarchical_table] FOREIGN KEY([parent_id])
REFERENCES [dbo].[hierarchical_table] ([id])

ALTER TABLE [dbo].[hierarchical_table] CHECK CONSTRAINT [FK_hierarchical_table_hierarchical_table]

To use it you would do something like -- assuming some reasonable naming scheme:

using (DataContext dc = new HierarchicalDataContext())
{
    HierarchicalTableEntity h = (from e in dc.HierarchicalTableEntities
                                 select e).First();
    var query = dc.FnTreeForParent( h.ID );
    foreach (HierarchicalTableViewEntity entity in query) {
        ...process the tree node...
    }
}
tvanfosson
I have tried out a function like this, and it seems to be the way to go. And it can be called from LINQ, attached to the datacontext. Also, why both the view and the function? - they seem to be duplication
Anthony
The function doesn't map the same schema as the table. It includes the level. If you didn't have the added column, you could map it directly onto the table. I assumed level in the hierarchy was important.
tvanfosson
+2  A: 

I have done this two ways:

  1. Drive the retrieval of each layer of the tree based on user input. Imagine a tree view control populated with the root node, the children of the root, and the grandchildren of the root. Only the root and the children are expanded (grandchildren are hidden with the collapse). As the user expands a child node the grandchildren of the root are display (that were previously retrieved and hidden), and a retrieval of all of the great-grandchildren is launched. Repeat the pattern for N-layers deep. This pattern works very well for large trees (depth or width) because it only retrieves the portion of the tree needed.
  2. Use a stored procedure with LINQ. Use something like a common table expression on the server to build your results in a flat table, or build an XML tree in T-SQL. Scott Guthrie has a great article about using stored procs in LINQ. Build your tree from the results when they come back if in a flat format, or use the XML tree if that is that is what you return.
Jason Jackson
+1  A: 

This extension method could potentially be modified to use IQueryable. I've used it succesfully in the past on a collection of objects. It may work for your scenario.

public static IEnumerable<T> ByHierarchy<T>(
 this IEnumerable<T> source, Func<T, bool> startWith, Func<T, T, bool> connectBy)
{
  if (source == null)
   throw new ArgumentNullException("source");

  if (startWith == null)
   throw new ArgumentNullException("startWith");

  if (connectBy == null)
   throw new ArgumentNullException("connectBy");

  foreach (T root in source.Where(startWith))
  {
   yield return root;
   foreach (T child in source.ByHierarchy(c => connectBy(root, c), connectBy))
   {
    yield return child;
   }
 }
}

Here is how I called it:

comments.ByHierarchy(comment => comment.ParentNum == parentNum, 
 (parent, child) => child.ParentNum == parent.CommentNum && includeChildren)

This code is an improved, bug-fixed version of the code found here.

JarrettV
Or you can check out where he snagged that from: http://weblogs.asp.net/okloeten/archive/2006/07/09/Hierarchical-Linq-Queries.aspx
TheSoftwareJedi
I added attribution to the Jedi. My version is simplified and improved.
JarrettV
A: 

I got this approach from Rob Conery's blog (check around Pt. 6 for this code, also on codeplex) and I love using it. This could be refashioned to support multiple "sub" levels.

var categories = from c in db.Categories
                 select new Category
                 {
                     CategoryID = c.CategoryID,
                     ParentCategoryID = c.ParentCategoryID,
                     SubCategories = new List<Category>(
                                      from sc in db.Categories
                                      where sc.ParentCategoryID == c.CategoryID
                                      select new Category {
                                        CategoryID = sc.CategoryID, 
                                        ParentProductID = sc.ParentProductID
                                        }
                                      )
                             };
Codewerks
But can it be refashioned to support an unlimited number of sublevels?
Anthony
You aren't going to add a dozen sub categories to this query - it is not particularly flexible.
Kirk Broadhurst
A: 

The trouble with fetching the data from the client side is that you can never be sure how deep you need to go. This method will do one roundtrip per depth and it could be union'd to do from 0 to a specified depth in one roundtrip.

public IQueryable<Node> GetChildrenAtDepth(int NodeID, int depth)
{
  IQueryable<Node> query = db.Nodes.Where(n => n.NodeID == NodeID);
  for(int i = 0; i < depth; i++)
    query = query.SelectMany(n => n.Children);
       //use this if the Children association has not been defined
    //query = query.SelectMany(n => db.Nodes.Where(c => c.ParentID == n.NodeID));
  return query;
}

It can't, however, do arbitrary depth. If you really do require arbitrary depth, you need to do that in the database - so you can make the correct decision to stop.

David B
+3  A: 

This option might also prove useful:

LINQ AsHierarchy() extension method
http://www.scip.be/index.php?Page=ArticlesNET18

I am using this and it works extremely well, especially the updated version referenced at the bottom.
Chris Ridenour
+1  A: 

I am surprised nobody has mentioned an alternative database design - when hierarchy needs to be flattened from multiple levels and retrieved with high performance (not so considering storage space) it is better to use another entity-2-entity table to track hierarchy instead of parent_id approach.

It will allow not only single parent relations but also multi parent relations, level indications and different types of relationships:

CREATE TABLE Person (
  Id INTEGER,
  Name TEXT
);

CREATE TABLE PersonInPerson (
  PersonId INTEGER NOT NULL,
  InPersonId INTEGER NOT NULL,
  Level INTEGER,
  RelationKind VARCHAR(1)
);
too
A: 

Please read the following link.

http://support.microsoft.com/default.aspx?scid=kb;en-us;q248915

Jacob Jojan
I don't like that method - "while" loops are not very good SQL practice, and if there's a more declarative way to do it, that should be preferred instead. And there is now: use a view or table-based function using the Common Table Expression, using the WITH .. UNION ALL construct as shown in other answers here.
Anthony