ansaurus

Question

Should I worry about running out of HierarchyIDs?

Answer 1

+1 A:

It's considered a "best practice" with the hierarchyid to "append" new IDs so that you don't use those in-between states (such as /2/5.6/) at all. If your hierarchyid is a clustered primary key then that's bad for performance, it will cause page splits similar to the way a uniqueidentifier will.

If you generate sequential children, it's highly unlikely that you'd ever need to worry about running out; you can have literally millions of children for each parent.

Here is an example of how you're expected to generate hierarchyid values:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION 
    UPDATE Hierarchy
    SET @LastChild = LastChild = HId.GetDescendant(LastChild, NULL)
    WHERE HId = @ParentID

    INSERT Hierarchy (HId, ...)
    VALUES (@LastChild, ...)
COMMIT

If you generate the ids this way, rest assured you'll never have to worry about running out.

For curiosity's sake, I ran a quick test to find out for sure how deep you can go. Here's a test script:

DECLARE
    @parent hierarchyid,
    @child hierarchyid,
    @high hierarchyid,
    @cnt int

SET @parent = '/1/'
SET @child = @parent.GetDescendant(NULL, NULL)
SET @cnt = 0

WHILE (@@ERROR = 0)
BEGIN
    SET @cnt = @cnt + 1
    PRINT CAST(@cnt AS varchar(10)) + ': ' + @child.ToString()
    SET @high = @parent.GetDescendant(@child, @high)
    SET @child = @parent.GetDescendant(@child, @high)
END

You can see it error out at a point-nesting level of 1426, so that's your worst-case limit for how many "in-between" nodes you can create, worst case meaning that every single insertion goes in between the two most deeply-nested nodes.

As I mentioned in the comments, it's pretty hard to hit this limit, but that still doesn't make it a good idea to try. The actual byte length gets longer as longer as you use up more and more "points", which degrades performance. If the hierarchyid is your clustered index, this will kill performance by page splits. If you're trying to rank nodes by parent then use a ranking column instead; it's easier and more efficient to sort from a later SELECT than it is to do during your INSERT where you have to worry about transaction isolation and other such headaches.

Aaronaught 2010-05-15 14:47:49

My application is maintaining rankings. I actually have no need for slashes "/" in my paths. Inserting between existing elements is key.

Bruno Martinez 2010-05-17 12:54:24

@Bruno: You really shouldn't think of `hierarchyid` as a way of maintaining a specific ordering of children. Yes it's *capable* of doing that, but it's not *optimized* for that case at all. If you want rankings, then add a ranking column. (P.S. I'm not sure what you mean when you say you have no need for slashes - there aren't *actual* "slashes" in a `hierarchyid`, that's just the string representation.)

Aaronaught 2010-05-17 13:14:17

What type do you suggest to use for rankings, allowing insertion between any pair?

Bruno Martinez 2010-05-18 17:33:27

@Bruno: What is the actual criteria for ranking them? Use that, or compute some integral score out of it. Why do you need to make this into an insertion problem (complicated) as opposed to a sorting problem (easy)?

Aaronaught 2010-05-18 17:48:18

Anyway, you can use the `hierarchyid` for it if you really want, it's pretty hard to run into a practical limit with "point" ids, but it tends to hurt performance if you do it too much, especially if the `hierarchyid` is your clustered index.

Aaronaught 2010-05-18 17:58:25

ansaurus

tags:

views:

answers:

Should I worry about running out of HierarchyIDs?

related questions