views:

175

answers:

3

Suppose a tree structure is implemented in SQL like this:

CREATE TABLE nodes (
    id INTEGER PRIMARY KEY,
    parent INTEGER -- references nodes(id)
);

Although cycles can be created in this representation, let's assume we never let that happen. The table will only store a collection of roots (records where parent is null) and their descendants.

The goal is to, given an id of a node on the table, find all nodes that are descendants of it.

A is a descendant of B if either A's parent is B or A's parent is a descendant of B. Note the recursive definition.

Here is some sample data:

INSERT INTO nodes VALUES (1, NULL);
INSERT INTO nodes VALUES (2, 1);
INSERT INTO nodes VALUES (3, 2);
INSERT INTO nodes VALUES (4, 3);
INSERT INTO nodes VALUES (5, 3);
INSERT INTO nodes VALUES (6, 2);

which represents:

1
`-- 2
    |-- 3
    |   |-- 4
    |   `-- 5
    |
    `-- 6

We can select the (immediate) children of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1;

We can select the children and grandchildren of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1
UNION ALL
SELECT b.* FROM nodes AS a, nodes AS b WHERE a.parent=1 AND b.parent=a.id;

We can select the children, grandchildren, and great grandchildren of 1 by doing this:

SELECT a.* FROM nodes AS a WHERE parent=1
UNION ALL
SELECT b.* FROM nodes AS a, nodes AS b WHERE a.parent=1 AND b.parent=a.id
UNION ALL
SELECT c.* FROM nodes AS a, nodes AS b, nodes AS c WHERE a.parent=1 AND b.parent=a.id AND c.parent=b.id;

How can a query be constructed that gets all descendants of node 1 rather than those at a fixed depth? It seems like I would need to create a recursive query or something.

I'd like to know if such a query would be possible using SQLite. However, if this type of query requires features not available in SQLite, I'm curious to know if it can be done in other SQL databases.

+4  A: 

The way you've set up your schema doesn't really suit itself very well to the relational model (as you've discovered). But there is another way that may not be so obvious at first, but is much more flexible. It's called a "nested set model" and it just structures things slightly differently.

Here's a link with specifics for MySQL, but it should translate to SQLite fairly easily. I won't go into too much detail here, because that article is actually pretty long, but it should give you all you need to get going.

Dean Harding
I believe the relational model is quite suitable. I defined `descendant` using strictly first-order logic, if I'm not mistaken :-)
Joey Adams
The solution in the link is fixed depth, while the question asks for any depth. Only difference with the query in the question is the use of `left join` instead of `union all`
Andomar
@Andomar: I don't think you read the whole linked article... start from the section titled "The Nested Set Model"
Dean Harding
@codeka: Wow... that would give high cost updates! But it's definitely any depth, +1 :)
Andomar
Yeah, updates/inserts/deletes are pretty expensive. I guess it depends on your typical usage. For mostly-read data it's actually surprisingly quick, and flexible. This would be well-suited for large datasets, whereas I'd choose yours if the dataset was smaller.
Dean Harding
+1  A: 

Oracle has a CONNECT BY syntax that could be used for this, but of course it's specific to oracle.

MeBigFatGuy
Both the question and the tags specify SQLite...
Andomar
@Andomar: As I said in the original post: "However, if this type of query requires features not available in SQLite, I'm curious to know if it can be done in other SQL databases."
Joey Adams
+3  A: 

Some databases allow that using recursive common table expressions, but not SQLLite.

You could consider changing your table definition. With a table like this, it's easy to query all descendants of 1:

id (varchar)
--------------
001
001002
001002003
001002003004
001002003005
001002006

This allows you to query all descendants of 1 like:

select * from YourTable where id like '001%'

It sounds a bit whacky, but works very well in practice.

Andomar
Definitely wacky, definitely awesome! Also, what about delimiting IDs with slashes? (e.g. "1/2/3/4")
Joey Adams
I like this, it's much simpler than my suggestion, though perhaps not quite as versatile. I guess it depends on your actual needs, though...
Dean Harding
its basically a url :) you could just use a url if you wanted and use the same technique. also could put a marker in for leaf nodes if you happened to want to query for those
Keith Nicholas
My program stores and deduplicates files (roughly a few million) at the block level in an archive. Both insertion and lookup need to be fast, so I think a solution like this one would be best.
Joey Adams