Assuming that there's a table called tree_node
, that has the primary key called id
and has a nullable column called parent_id
and the table embeds a tree structure (or a forest), how can one compute the depth of a node in the tree/forest in an efficient way in SQL?
views:
123answers:
3You'll need recursion capability. Using recursive queries this will be:
WITH RECURSIVE node_ancestors(node_id, parent_id) AS (
SELECT id, id FROM tree_node WHERE id IN (1, 2, 3)
UNION ALL
SELECT na.node_id, tree_node.parent_id
FROM node_ancestors AS na, tree_node
WHERE tree_node.id = na.parent_id AND tree_node.parent_id IS NOT NULL
)
SELECT node_id, COUNT(parent_id) AS depth FROM node_ancestors GROUP BY node_id;
The other options are doing the recursion in stored procedures, doing the recursion in the application and limiting the amount of recursion and using lots of joins. (The last option isn't really efficient for non-trivial depths)
If the depth is unlimited, then the problem is recursive, and there's not really an easy and efficient solution. (There may be easy xor efficient solutions.) If you have control of the schema, and you need to access this sort of data regularly, then you could refactor the table as a nested set, with left and right containment bounds for each element. This would allow you to compute the depth of a node in a single query with elementary conditions, approximately:
select count(*) from tree_node
where left < myLeft
and right > myRight
A common way to handle trees is, in addition to the regular columns of KEY and PARENT, you also have a PATH type of columns, which contains a string value, containing the keys of the nodes that make up the path, from the root, down to and including the node itself, delimited by a character that is not allowed to be part of a key itself.
Let me give you an example:
KEY PARENT PATH
1 - *1*
2 1 *1*2*
3 1 *1*3*
4 3 *1*3*4*
5 1 *1*5*
6 5 *1*5*6*
This is mostly used with trees that doesn't change too much, like department hierarchies or similar.
I know such a string isn't exactly following the normalization theories, since it seems to invalidate multiple rules (multi-keys, multi-valued fields, etc.), but it is quite useful in many scenarios, including the one you're asking for.
In your case, you simply have to retrieve the TREE value for the node in question, and depending on what is easiest, either count the number of delimiter characters, or removing them with a replace function, and calculating the difference in lengths.
Here's SQL to give you the above list of nodes, with their depths:
select KEY, PARENT, LEN(PATH)-LEN(REPLACE(PATH, '*', ''))-1 as DEPTH
from NODES
Note that this approach does not require any special syntax or support by the database engine for recursive SQL, and lends itself very nicely to indexing as well.