views:

558

answers:

2

I am trying to implement hierarchyID in a table (dbo.[Message]) containing roughly 50,000 rows (will grow substantially in the future). However it takes 30-40 seconds to retrieve about 25 results.

The root node is a filler in order to provide uniqueness, therefor every subsequent row is a child of that dummy row.

I need to be able to traverse the table depth-first and have made the hierarchyID column (dbo.[Message].MessageID) the clustering primary key, have also added a computed smallint (dbo.[Message].Hierarchy) which stores the level of the node.

Usage: A .Net application passes through a hierarchyID value into the database and I want to be able to retrieve all (if any) children AND parents of that node (besides the root, as it is filler).

A simplified version of the query I am using:

@MessageID hierarchyID   /* passed in from application */

SELECT 
m.MessageID, m.MessageComment 

FROM 
dbo.[Message] as m

WHERE 
m.Messageid.IsDescendantOf(@MessageID.GetAncestor((@MessageID.GetLevel()-1))) = 1

ORDER BY 
m.MessageID

From what I understand, the index should be detected automatically without a hint.

From searching forums I have seen people utilizing index hints when dealing with breadth-first indexes, but have not observed this application in depth-first situations. Would that be a relevant approach for my scenario?

I have spent the past few days trying to find a solution for this issue, but to no avail. I would greatly appreciate any assistance, and as this is my first post, I apologize in advance if this would be considered a 'noobish' question, I have read the MS documentation and searched countless forums, but have not came across a succinct description of the specific issue.

+2  A: 

It's not entirely clear whether you're trying to optimize for depth-first or breadth-first search; the question suggests depth-first, but the comments at the end are about breadth-first.

You have all the indexes you need for depth-first (just index the hierarchyid column). For breadth-first, it's not enough just to create the computed level column, you have to index it too:

ALTER TABLE Message
ADD [Level] AS MessageID.GetLevel()

CREATE INDEX IX_Message_BreadthFirst
ON Message (Level, MessageID)
INCLUDE (...)

(Note that for non-clustered indexes you'll most likely need the INCLUDE - otherwise, SQL Server may resort to doing a clustered index scan instead.)

Now, if you're trying to find all ancestors of a node, you want to take a slightly different tack. You can make these searches lightning-fast, because - and here's what's cool about hierarchyid - each node already "contains" all of its ancestors.

I use a CLR function to make this as fast as possible, but you can do it with a recursive CTE:

CREATE FUNCTION dbo.GetAncestors
(
    @h hierarchyid
)
RETURNS TABLE
AS RETURN
WITH Hierarchy_CTE AS
(
    SELECT @h AS id

    UNION ALL

    SELECT h.id.GetAncestor(1)
    FROM Hierarchy_CTE h
    WHERE h.id <> hierarchyid::GetRoot()
)
SELECT id FROM Hierarchy_CTE

Now, to get all of the ancestors and descendants, use it like this:

DECLARE @MessageID hierarchyID   /* passed in from application */

SELECT m.MessageID, m.MessageComment 
FROM Message as m
WHERE m.MessageId.IsDescendantOf(@MessageID) = 1
OR m.MessageId IN (SELECT id FROM dbo.GetAncestors(@MessageID.GetAncestor(1)))
ORDER BY m.MessageID

Try it out - this should solve your performance problems.

Aaronaught
Sorry for the confusion, depth-first is indeed what i'm after!Thanks a lot for the suggestion, I shall try it right away.
AndalusianCat
Just for testing purposes, I have removed @MessageID.GetAncestor altogetherleaving just:m.MessageId.IsDescendantOf(@MessageID) = 1in the WHERE clause and when I ran that proc, seek time is still between 150 to 420ms per result, which is very slow for my application.Performance is a priority and I am completely unfamiliar with CLR, however I would really like to learn how to implement it, if that would provide the best performance. Any suggestion as per where to start?
AndalusianCat
@AndalusianCat: The CLR version is for the ancestor query. If you're finding it slow just using `IsDescendantOf`, please post an actual query, a table schema (including indexes) and the execution plan. `hierarchyid` queries are typically much faster than that.
Aaronaught
A: 

Found workaround here: http://connect.microsoft.com/SQLServer/feedback/details/532406/performance-issue-with-hierarchyid-fun-isdescendantof-in-where-clause#

Just reminding that I started with a heirarchyID passed in from the application and my goal is to retrieve any and all relatives of that value (both Ancestors and Descendants).

In my specific example, I had to add the following declarations before the SELECT statement:

declare @topNode hierarchyid = (select @messageID.GetAncestor((@messageID.GetLevel()-1)))
declare @topNodeParent hierarchyid = (select @topNode.GetAncestor(1))
declare @leftNode hierarchyid= (select @topNodeParent.GetDescendant (null, @topNode))
declare @rightNode hierarchyid= (select @topNodeParent.GetDescendant (@topNode, null))

The WHERE clause has been changed to:

messageid.IsDescendantOf(@topNode)=1 AND (messageid > @leftNode ) AND (messageid < @rightNode )

The querying performance increase is very significant:

For every result passed in, seek time is now 20ms on average (was from 120 to 420).

When querying 25 values, it previously took 25 - 35 seconds to return all related nodes (in some cases each value had many relatives, in some there were none). It now takes only 2 seconds.

Thank you very much to all who have contributed to this issue on this site and on others.

AndalusianCat