views:

1831

answers:

6

I'm working on a project wich requires a tree of categories, organized as id, parent, title table. Which are the best ways to retrieve category and its subcategories(and a full tree, if root categories have parent=0) in Postgres? I'm looking for a pure database solution, but if there is a way for Ruby and PHP - it will be great too.

The main goal is speed of select clauses, 'cause data in this table is not critical for update/insert/delete speed.

UPD: there will be also path searching, i mean path from current vertex(category) to root category.

+2  A: 

Take a look at the "ltree" contrib module.

Milen A. Radev
+2  A: 

retrieve category and its subcategories

If you only have a limited depth of subitems, you can do this using a self-join, eg. two levels deep:

SELECT *
FROM categories AS child
LEFT JOIN categories AS parent ON parent.id=child.parent
LEFT JOIN categories AS grandparent ON grandparent.id=parent.parent
WHERE child.id=(id) OR parent.id=(id) OR grandparent.id=(id);

You can't do this for an arbitrary-depth hierarchy using standard SQL over a ‘parent-id-foreign-key’ type schema.

Some DBMSs provide non-standard hierarchical tools that allow something like this in various ways, but if you want to stick to cross-DBMS-compatible code you'll need to rejig your schema to one of the better models of representing hierarchies. The two popular ones are:

  • Nested Set. Stores a linear ordering representing a depth-first search of the tree in two columns of the target table (one of which you'll already have if your target has explicit ordering).

  • Adjacency Relation. Stores each ancestor/descendent pair in a separate join table.

There are advantages and disadvantages to each approach, and numerous variants (eg. sparse nested set numbering, ‘distance’ in AR) which can affect how expensive various types of add/delete/move-position operations are. Personally I tend to gravitate towards a simplified nested set model by default as it contains less redundancy than AR.

bobince
You *can* do this for an arbitrary-depth hierarchy using standard SQL over a ‘parent-id-foreign-key’ type schema: you can use a recursive common table expression. Admittedly, PostgreSQL only supported this from 8.4, which came out months after this thread, but i thought this might be a useful addendum. FWIW, Firebird, MS SQL Server and DB2 support recursive CTEs too, although MSSQL's version is limited. Oracle has its own weird syntax.
Tom Anderson
+1  A: 

I've been playing around with ltree, which is a PostgreSQL contrib module to see if it is a good fit for threaded comments. You create a column in your table that stores the path and create an ltree index on it.. You can then perform queries like this:

 ltreetest=# select path from test where path ~ '*.Astronomy.*';
                     path                      
-----------------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
 Top.Collections.Pictures.Astronomy
 Top.Collections.Pictures.Astronomy.Stars
 Top.Collections.Pictures.Astronomy.Galaxies
 Top.Collections.Pictures.Astronomy.Astronauts

I haven't played around with it enough to determine how well it performs with things like inserts, updates or deletes. I assume a delete would look like:

DELETE FROM test WHERE path ~ '*.Astronomy.*';

I'm thinking, a threaded comment table might look like:

CREATE SEQUENCE comment_id_seq
  INCREMENT 1
  MINVALUE 1
  MAXVALUE 9223372036854775807
  START 78616
  CACHE 1;

CREATE TABLE comments (
comment_id int PRIMARY KEY,
path ltree,
comment text
);

CREATE INDEX comments_path_idx ON comments USING gist (path);

An insert would crudely (and untested-ly) look like:

CREATE FUNCTION busted_add_comment(text the_comment, int parent_comment_id) RETURNS void AS
$BODY$
DECLARE
    INT _new_comment_id; -- our new comment_id
    TEXT _parent_path;   -- the parent path
BEGIN
    _new_comment_id := nextval('comment_id_seq'::regclass);
    SELECT path INTO _parent_path FROM comments WHERE comment_id = parent_comment_id;

    -- this is probably busted SQL, but you get the idea... this comment's path looks like
    --   the.parent.path.US
    --
    -- eg (if parent_comment_id was 5 and our new comment_id is 43):
    --  3.5.43
    INSERT INTO comments (comment_id, comment, path) VALUES (_new_comment_id, the_comment, CONCAT(_parent_path, '.', _new_comment_id));

END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

Or something. Basically the path is just a hierarchy made up of all the primary keys.

Cory R. King
A: 

There is an acts_as_tree plugin for Rails, which has worked well for me in the past. I had a fairly small tree, though - around 15,000 nodes.

Sarah Mei
A: 

I've become fond of the the nested set model for this kind of situation. The Updates and Inserts can be a bit tricky, but the selects are usually very concise and fast. The performance can be even better if you add an actual reference to the parent of a node (it will eliminate a join in some cases. It also includes a natural sorting of childnodes.

A typical query for the current node and all children would look like:

select name
from nestedSet c inner join nestedSet p ON c.lft BETWEEN p.lft AND p.rgt
where p.id = 1
order by lft

A few well placed group by clauses will also net you some quick stats about your tree.

Josh
A: 

Just to add, this link has a good explanation of Adjacency List Model and the Nested Set Models, including example SQLs for tree manipulation and such.

Hierarchies in an RDBMS is a difficult topic. I have Joe Celko’s Trees and Hierarchies in SQL for Smarties on my wish list to buy and read some day.

Evan