views:

123

answers:

3
+1  Q: 

Tree depth in SQL

Assuming that there's a table called tree_node, that has the primary key called id and has a nullable column called parent_id and the table embeds a tree structure (or a forest), how can one compute the depth of a node in the tree/forest in an efficient way in SQL?

+1  A: 

You'll need recursion capability. Using recursive queries this will be:

WITH RECURSIVE node_ancestors(node_id, parent_id) AS (
        SELECT id, id FROM tree_node WHERE id IN (1, 2, 3)
    UNION ALL
        SELECT na.node_id, tree_node.parent_id
            FROM node_ancestors AS na, tree_node
            WHERE tree_node.id = na.parent_id AND tree_node.parent_id IS NOT NULL
)
SELECT node_id, COUNT(parent_id) AS depth FROM node_ancestors GROUP BY node_id;

The other options are doing the recursion in stored procedures, doing the recursion in the application and limiting the amount of recursion and using lots of joins. (The last option isn't really efficient for non-trivial depths)

Ants Aasma
Nice feature. Unfortunately not present on all SQL implementations.
Marian
A: 

If the depth is unlimited, then the problem is recursive, and there's not really an easy and efficient solution. (There may be easy xor efficient solutions.) If you have control of the schema, and you need to access this sort of data regularly, then you could refactor the table as a nested set, with left and right containment bounds for each element. This would allow you to compute the depth of a node in a single query with elementary conditions, approximately:

select count(*) from tree_node
    where left < myLeft
    and right > myRight
Thom Smith
+1  A: 

A common way to handle trees is, in addition to the regular columns of KEY and PARENT, you also have a PATH type of columns, which contains a string value, containing the keys of the nodes that make up the path, from the root, down to and including the node itself, delimited by a character that is not allowed to be part of a key itself.

Let me give you an example:

KEY        PARENT         PATH
1          -              *1*
  2        1              *1*2*
  3        1              *1*3*
    4      3              *1*3*4*
  5        1              *1*5*
    6      5              *1*5*6*

This is mostly used with trees that doesn't change too much, like department hierarchies or similar.

I know such a string isn't exactly following the normalization theories, since it seems to invalidate multiple rules (multi-keys, multi-valued fields, etc.), but it is quite useful in many scenarios, including the one you're asking for.

In your case, you simply have to retrieve the TREE value for the node in question, and depending on what is easiest, either count the number of delimiter characters, or removing them with a replace function, and calculating the difference in lengths.

Here's SQL to give you the above list of nodes, with their depths:

select KEY, PARENT, LEN(PATH)-LEN(REPLACE(PATH, '*', ''))-1 as DEPTH
from NODES

Note that this approach does not require any special syntax or support by the database engine for recursive SQL, and lends itself very nicely to indexing as well.

Lasse V. Karlsen
Except its not invalidating the multi-valued field, since its still only a single specific path, one value. Also, at the same time as storing path, you could compute the depth once and store that if its going to be a common requirement.
KeeperOfTheSoul
+1, this is my favourite tree method.
RedFilter