views:

431

answers:

1

Hi to all, I've read about various ways of representing hierarchical structure within a relational database like Adjacency List.

I have decided to try a straight forward way like a table (oversimplified) done like this: id | name | parent where parent is a inner reference to id.

This should be enough to represent a simple tree of undefined depth.

Now, how can i construct a tree to represent this kind of data structure? Like, if I want to produce an XML or a series of nested <ul><li> to print it in HTML format, what is the most efficient way to loop through the nodes? I wonder if Linq can come to help, but I'm also interested in more general (not .NET bound) and theoretical answers.

Thanks in advance.

+1  A: 

Ironically, the hardest part of this problem can be getting a constrained set of records from the database that satisfy a particular path predicate. Unless you're using a database that provides hierarchical query support (like Oracle's CONNECT BY), this can get quite complicated. Check out this SO question for some approaches on that part.

But, let's assume you've managed to load your data ... you can do so into either a custom object collection, or into a DataSet, or into an XML structure (if you don't care about manipulating the data other than as strings). If you have a custom object, you can add a property that exposes the child collection by querying the entire data set for records that match on the parent key (see the example below). If the number of records is large, this starts to become non-performant because of the cost of searching the collection. This is where the DataSet solution shines - since you can add custom indexes on the ParentID column that will improve the performance of lookups. You can read about Typed Data Sets here to learn more about how to extend my example below in that direction.

Here's an example of a custom object with tree navigation support. I'm using a directory structure (folders) because the are a natural example of hierarchical data. Again, be wary of using this code without an indexed structure as a full child-traversal will be O(n2).

class Folder  // could be converted to a DataRow-derivative
{
    public int Id          { get; set; }
    public string Name     { get; set; }
    public int? ParentId   { get; set; }

    public IEnumerable<Folder> GetChildren( IEnumerable<Folder> allFolders )
    {
        // NOTE: becomes expensive on large hierarchies on unindexed data
        //   a slightly better version, could replace IEnumerable<Folder>
        //   with IOrderedEnumerable<Folder> which would allow the use of
        //   a binary search algorithm for finding children (assuming it's
        //   ordered on the ParentId property).
        return AllFolders.Where( x => x.ParentId.HasValue && 
                                      x.ParentId.Value = this.Id );
    }
}

// elsewhere in your code...

void LoadAndPrintFolders()
{
   // load the folder data... from database, or wherever
   IEnumerable<Folder> allFolders = LoadFolders();

   // find all root folders (let's say, those that have a null ParentId
   IEnumerable<Folder> rootFolders = 
          allFolders.Where( x => !x.ParentId.HasValue );

   // iterate over all of the data as a tree structure...
   foreach( var folder in rootFolders )
   {
       PrintFolder( allFolders, folder, 0 );
   }

}

// recursive function to print all folders...
void PrintFolder( IEnumerable<Folder> allFolders, Folder f, int level )
{
    Console.WriteLine( "{0}{1}", new string('\t',level), f.Name );
    foreach( var folder in f.GetChildren( allFolders )
    {
        PrintFolder( allFolders, folder, level + 1 );
    }
}
LBushkin