tags:

views:

440

answers:

6

I'm curious to know what the best way (best practice) to handle hierarchies are in regards to database design. Here is a small example of how I usually handle them.

Node Table

NodeId int PRIMARY KEY
NodeParentId int NULL
DisplaySeq int NOT NULL
Title nvarchar(255)

Ancestor Table

NodeId int
AncestorId int
Hops int

with Indexes on NodeId, AncestorId, Hops

Tables look like this:

Node Table

NodeId    NodeParentId    DisplaySeq    Title
1         NULL            1             'Root'
2         1               1             'Child 1'
3         1               2             'Child 2'
4         2               1             'Grandchild 1'
5         2               2             'Grandchild 2'

Ancestor Table

NodeId    AncestorId    Hops
1         NULL          0
1         1             0
2         1             1
2         2             0
3         1             1
3         3             0
4         1             2
4         2             1
4         4             0
5         1             2
5         2             1
5         5             0

With this design, I've found that with large hierarchies I can get an entire section of the hierarchy very quickly by joining on the Ancestor table for AncestorId = target NodeId, like:

SELECT *
FROM Node n
INNER JOIN Ancestor a on a.NodeId=n.NodeId
WHERE a.AncestorId = @TargetNodeId

It's also easy to get direct children as well

SELECT *
FROM Node n
INNER JOIN Ancestor a on a.NodeId=n.NodeId
WHERE a.AncestorId = @TargetNodeId
AND Hops = 1

I'm interested in knowing what other solutions you may have used for this type of thing. In my experience, hierarchies can get pretty hairy, and any way to optimize their retrieval is very important.

+8  A: 

There are some vendor-specific extensions to do this, but my favorite db-neutral way comes from Joe Celko - google 'Joe Celko Trees and Hierarchies' or buy this book: link text

This is a very clever set-based way to go. Easy to query hierarchy. I added the 'parentID' field you have just because I ask the 'direct children' and 'parent' questions a lot and that speeds those up. But this is a wonderful way to get a 'ancestry' or 'descdent' query.

n8wrl
+5  A: 

You may also want to check out the "nested sets" pattern:

http://www.intelligententerprise.com/001020/celko.jhtml

Or you can Google for more.

P.S.: Curses, n8wrl, you type faster than I do!

MarkusQ
Nested Sets! That's the term I was looking for!
n8wrl
Pretty interesting article. The one problem I've always had is when adding/deleting a node, you have to update the position of every other node after it.
Rorschach
You do. And that's where Tom H's answer is so important. For me, this works wonderfully on hierarchies I have that change pretty infrequently.
n8wrl
+1  A: 

In Oracle, you can use CONNECT BY/START WITH to query hierarchial data. In SQL Server, you can use a stored procedure, that calls itself recursively.

baretta
I've used the recursive calls but the query runs very slowly, which is why I implemented the Ancestor table, so I could get away from a recursive call.
Rorschach
+2  A: 

SQL Server 2008 introduced the hierarchyid data type

SQLMenace
+3  A: 

As MarkusQ and n8wrl have already pointed out, Joe Celko has some good stuff on this. I'll just add that there are multiple ways to model a hierarchy (Joe's book contains several I believe, not just one that he considers the "best"). Your final decision will hopefully take into account your own specific needs. Some of the different ways to model it are better for write-intensive operations while others are better for frequent or fast reads up and down the hierarchy. Just keep in mind what your system will be doing with it.

Tom H.
A: 

I'd definitely recommend nested sets. They are great.

http://threebit.net/tutorials/nestedset/tutorial1.html http://www.dbmsmag.com/9603d06.html

Nathan Bertram