views:

121

answers:

3

I have a database table (sqlite) containing items that form a tree hierarchy. Each item has an id field (for itself) and a parentId for its parent. Now given an item, I must retrieve the whole chain from the root to the item.

Basically the algorithm in pseudocode looks like:

  1. cursor is item
  2. retrieve parentItem for cursor by parentId
  3. if parentItem is not rootItem, then cursor = parentItem and goto 2.

So I have to perform an SQL SELECT query for each item.

Is it possible to retrieve the whole chain rootItem -> ... -> item by performing only one SQL query?

A: 

Not with ANSI standard SQL it isn't, no. Well, that's not strictly true. You can do left outer joins and put in enough to cover the likely maximum depth but unless you restrain the max depth and include that many joins, it won't always work.

If your set of rows is sufficiently small (say less than 1000), just retrieve them all and then figure it out. It'll be faster than single read traversals in all likelihood.

You could batch the parent traversal. Have a query like:

SELECT t1.id id1, t1.parent parent1,
       t2.id id2, t2.parent parent2,
       t3.id id3, t3.parent parent3,
       t4.id id4, t4.parent parent4,
       t5.id id5, t5.parent parent5
FROM mytable t1
LEFT OUTER JOIN mytable t2 ON t1.parent = t2.id
LEFT OUTER JOIN mytable t3 ON t2.parent = t3.id
LEFT OUTER JOIN mytable t4 ON t3.parent = t4.id
LEFT OUTER JOIN mytable t5 ON t4.parent = t5.id
WHERE t1.id = 1234

and extend it to whatever number you want. If the last retrieved parent isn't null you aren't at the top of the tree yet so run the query again. This way you should hopefully reduce it to 1-2 roundtrips.

Other than that you could look at ways of encoding that data in the ID. This isn't recommended but if you limit, say, each node to having 100 children you could say that node with an ID 10030711 has path of 10 -> 03 -> 07 -> 11. That of course has other problems (like max ID length) and of course it's hacky.

It's also worth noting that there are two basic models for hierarchical data in SQL. Adjacency lists and nested sets. Your way (which is pretty common) is an adjacency set. Nested sets wouldn't really help with this situation though and they are complicated to do inserts on.

cletus
Unfortunately my set of rows is reasonably big and is even constantly growing.
JS_is_bad
+2  A: 

There are lots of creative ways of organizing hierarchial data in a database, but consistently I find it easiest bring back the data in non-hierarchial format, then match up parent and child records programmatically.

Total amount of effort: 1 query + 1 programmatic pass through your dataset to create the hierarchy.


Alternative approach:

I've used this method in the past with limited success. You can store the path of each item in your tree using a varchar(max) column as follows:

ID    ParentID    Path
--    --------    ----
1     null        1/
2     1           1/2/
3     null        3/
4     2           1/2/4/
5     4           1/2/4/5/
6     null        6/
7     5           1/2/4/5/7/
9     5           1/2/4/5/9/

From that point, getting all of the nodes under ID = 5 is a very simple:

SELECT *
FROM table
WHERE Path like (SELECT Path FROM Table WHERE ID = 5) + '%'
Juliet
Nice technique, I would +1 it but I'm out of votes for a couple hours. Why do you say **limited** success tho?
Alix Axel
Why not just SELECT * FROM table WHERE Path LIKE '%/5/%'; ?
Alix Axel
@eyze: sure, that could work too :) But, in almost all database implementations, the database can't use indexes on like expressions which begin with a wildcard character. See the SQLite documentation (http://www.sqlite.org/optoverview.html): "Terms that are composed of the LIKE or GLOB operator can sometimes be used to constrain indices. There are many conditions on this use: [...] The right-hand side of the LIKE or GLOB must be a string literal that does not begin with a wildcard character". I'm not sure that my code above would use indices eithers since its not a string literal. YMMV.
Juliet
@eyze: maybe the phrase "limited success" isn't the best verbiage. I've used the technique on small personal projects, but never really tried it out on *big* tables or *deep* hierarchies. Still worth experimenting with some more, however :)
Juliet
A: 

are you able to change the table structure? Looks like storing left and right nodes would be easier to work with than storing just a parent because then a single select is possible. See the following links:

http://www.mail-archive.com/[email protected]/msg23867.html

http://weblogs.asp.net/aghausman/archive/2009/03/16/storing-retrieving-hierarchical-data-in-sql-server-database.aspx (this is SQLServer, but they have a diagram that might help.)

sql_mommy