views:

1480

answers:

4

I have a Table that stores Hierarchical information using the Adjacency List model. (uses a self referential key - example below. This Table may look familiar):

category_id name                 parent
----------- -------------------- -----------
1           ELECTRONICS          NULL
2           TELEVISIONS          1
3           TUBE                 2
4           LCD                  2
5           PLASMA               2
6           PORTABLE ELECTRONICS 1
7           MP3 PLAYERS          6
8           FLASH                7
9           CD PLAYERS           6
10          2 WAY RADIOS         6


What is the best method to "flatten" the above data into something like this?

category_id lvl1        lvl2        lvl3        lvl4
----------- ----------- ----------- ----------- -----------
1           1           NULL        NULL        NULL
2           1           2           NULL        NULL
6           1           6           NULL        NULL
3           1           2           3           NULL
4           1           2           4           NULL
5           1           2           5           NULL
7           1           6           7           NULL
9           1           6           9           NULL
10          1           6           10          NULL
8           1           6           7           8


Each row is one "Path" through the Hierarchy, except there is a row for each node (not just each leaf node). The category_id column represents the current node and the "lvl" columns are its ancestors. The value for the current node must also be in the farthest right lvl column. The value in the lvl1 column will always represent the root node, values in lvl2 will always represent direct descendants of lvl1, and so on.

If possible the method to generate this output would be in SQL, and would work for n-tier hierarchies.

A: 

I haven't a clue how to do it in SQL, but what you want is to construct the graph and then walk it depth-first.

Yeah, everything that occurs to me in SQL seems irredeemably ugly, but I'm no Joe Celko.

Building a graph in something like Python would be pretty straightforward, though.

Charlie Martin
+4  A: 

To do multi-level queries across a simple adjacency-list invariably involves self-left-joins. It's easy to make a right-aligned table:

SELECT category.category_id,
    ancestor4.category_id AS lvl4,
    ancestor3.category_id AS lvl3,
    ancestor2.category_id AS lvl2,
    ancestor1.category_id AS lvl1
FROM categories AS category
    LEFT JOIN categories AS ancestor1 ON ancestor1.category_id=category.category_id
    LEFT JOIN categories AS ancestor2 ON ancestor2.category_id=ancestor1.parent
    LEFT JOIN categories AS ancestor3 ON ancestor3.category_id=ancestor2.parent
    LEFT JOIN categories AS ancestor4 ON ancestor4.category_id=ancestor3.parent;

To left-align it like your example is a bit more tricky. This comes to mind:

SELECT category.category_id,
    ancestor1.category_id AS lvl1,
    ancestor2.category_id AS lvl2,
    ancestor3.category_id AS lvl3,
    ancestor4.category_id AS lvl4
FROM categories AS category
    LEFT JOIN categories AS ancestor1 ON ancestor1.parent IS NULL
    LEFT JOIN categories AS ancestor2 ON ancestor1.category_id<>category.category_id AND ancestor2.parent=ancestor1.category_id
    LEFT JOIN categories AS ancestor3 ON ancestor2.category_id<>category.category_id AND ancestor3.parent=ancestor2.category_id
    LEFT JOIN categories AS ancestor4 ON ancestor3.category_id<>category.category_id AND ancestor4.parent=ancestor3.category_id
WHERE
    ancestor1.category_id=category.category_id OR
    ancestor2.category_id=category.category_id OR
    ancestor3.category_id=category.category_id OR
    ancestor4.category_id=category.category_id;

would work for n-tier hierarchies.

Sorry, arbitrary-depth queries are not possible in the adjacency-list model. If you are doing this kind of query a lot, you should change your schema to one of the other models of storing hierarchical information: full adjacency relation (storing all ancestor-descendent relationships), materialised path, or nested sets.

If the categories don't move around a lot (which is usually the case for a store like your example), I would tend towards nested sets.

bobince
Thank you for your answer. If the data was stored using the Nested Set model, would there be a better way to get this output than the second option you give above?
Aaron Hoffman
Also, Any ideas for improving the performance of the second query above?
Aaron Hoffman
+1  A: 

Traversing a tree of arbitrary depth generally involves recursive procedural code, unless you make use of the special features of some DBMS.

In Oracle, the CONNECT BY clause will permit you to traverse the tree in depth first order if you use adjacency list, as you did here.

If you use nested sets, the left sequence number will provide you with the order to visit the nodes.

Walter Mitty
+2  A: 

As mentioned, SQL has no clean way to implement tables with dynamically varying numbers of columns. The only two solutions I have used before are: 1. A fixed number Self-Joins, giving a fixed number of columns (AS per BobInce) 2. Generate the results as a string in a single column

The second one sounds grotesque initially; storing IDs as string?! But when the output is formatted as XML or something, people don't seem to mind so much.

Equally, this is of very little use if you then want to join on the results in SQL. If the result is to be supplied to an Application, it Can be very suitable. Personally, however, I prefer to do the flattening in the Application rather than SQL


I'm stuck here on a 10 inch screen without access to SQL so I can't give the tested code, but the basic method would be to utilise recursion in some way;
- A recursive scalar function can do this
- MS SQL can do this using a recursive WITH statement (more efficient)

Scalar Function (something like):

CREATE FUNCTION getGraphWalk(@child_id INT)
RETURNS VARCHAR(4000)
AS
BEGIN

  DECLARE @graph VARCHAR(4000)

  -- This step assumes each child only has one parent
  SELECT
    @graph = dbo.getGraphWalk(parent_id)
  FROM
    mapping_table
  WHERE
    category_id = @child_id
    AND parent_id IS NOT NULL

  IF (@graph  IS NULL)
    SET @graph = CAST(@child_id AS VARCHAR(16))
  ELSE
    SET @graph = @graph + ',' + CAST(@child_id AS VARCHAR(16))

  RETURN @graph

END


SELECT
  category_id                         AS [category_id],
  dbo.getGraphWalk(category_id)       AS [graph_path]
FROM
  mapping_table
ORDER BY
  category_id

I haven't used a recursive WITH in a while, but I'll give the syntax a go even though I don't have SQL here to test anything :)

Recursive WITH

WITH
  result (
    category_id,
    graph_path
  )
AS
(
  SELECT
    category_id,
    CAST(category_id AS VARCHAR(4000))
  FROM
    mapping_table
  WHERE
    parent_id IS NULL

  UNION ALL

  SELECT
    mapping_table.category_id,
    CAST(result.graph_path + ',' + CAST(mapping_table.category_id AS VARCHAR(16)) AS VARCHAR(4000))
  FROM
    result
  INNER JOIN
    mapping_table
      ON result.category_id = mapping_table.parent_id
)

SELECT
  *
FROM
  result
ORDER BY
  category_id


EDIT - OUTPUT for both is the same:

1   '1'
2   '1,2'
3   '1,2,3'
4   '1,2,4'
5   '1,2,5'
6   '1,6'
7   '1,6,7'
8   '1,6,7,8'
9   '1,6,9'
Dems
Thank you. Both of these methods work (aside from the differences you mention above), but the SQL does needs a little tweaking. If you get a chance, please update it (and please include output). I would, but I can not yet edit your answer.
Aaron Hoffman
Syntax and Output sorted
Dems