views:

52

answers:

3

An example of my tree table is: ([id] is an identity)

[id], [parent_id], [path]

1, NULL, 1

2, 1, 1-2

3, 1, 1-3

4, 3, 1-3-4

My goal is to query quickly for multiple rows of this table and view the full path of the node from its root, through its superiors, down to itself. The ultimate question is, should I generate this path on inserts and maintain it in its own column or generate this path on query to save disk space? I guess it depends if this table is write heavy or read heavy.

I've been contemplating several approaches to using the "path" characteristic of this parent/child relationship and I just can't seem to settle on one. This "path" is simply for display purposes and serves absolutely no purpose other than that. Here is what I have done to implement this "path."

  1. AFTER INSERT TRIGGER - requires passing a NULL path to the insert and updating the path for the record at the inserted rows identity
  2. INSTEAD OF INSERT TRIGGER - does not require insert to have NULL path passed, but does require the trigger to insert with a NULL path and updating the path for the record at SCOPE_IDENTITY()
  3. STORED PROCEDURE - requiring all inserts into this table to be done through the stored procedure implementing the trigger logic
  4. VIEW - requires building the path in the view

1 and 2 seem annoying if massive amounts of data are entered at once.

3 seems annoying because all inserts must go through the procedure in order to have a valid path populated.

1, 2, and 3 require maintaining a path column on the table.

4 removes all the limitations of the above but require the view to perform the path logic and requires use of the view if a path is to be displayed.

I have successfully implemented all of the above approaches and I'm mainly looking for some advice. Am I way off the mark here or are any of the above acceptable? Each has it's advantages and disadvantages.

A: 

If it's just for display purposes then use method 5. Don't bother doing it!

Have you UI layers handle it if and when required

TFD
+1  A: 

"It depends" sure applies to this one. There are so many possibilities, its just not possible to name The Best One. Here's a handful of ideas.

How often is new data added? Can data be modified (or items deleted) such that a hieararchical "chain" changes? How much data/how big is the table? How will you be using the data? How important is it to be up-to-date? All of these lead to different possible implementations, based on performace requirements.

We have a similar setup in a data warehouse. The data gets entered in controlled ETL batches, so we have the luxury of feeding it through a stored procedure to properly determine and load the "path" column, and then we never worry about it again.

Barring strong reasons not to, I'd go with the stored procedure implementation, if only because the code can get a bit tricky. If you cannot control people inserting/updating/deleting [I originally wrote "monkeying"] with the data outside of the stored procedure, then I'd think you have security issues. If the data can be inacurate or out of date for short periods of times, you could have a scheduled routine that regularly checks and recalibrates entries (either new unset items, or recheck everything for modifications.)

Philip Kelley
A: 

You could use a recursive CTE (Common Table Expression) - but don't ask me how well it'll perform :-)

Something like:

WITH RecursiveCTE AS
(
    SELECT ID, ParentID, CAST(ID AS VARCHAR(100)) AS 'Path'
    FROM dbo.YourTableName
    WHERE ParentID IS NULL

    UNION ALL

    SELECT t.ID, t.ParentID, CAST(cte.Path + '-' + CAST(t.ID AS VARCHAR(3)) AS VARCHAR(100)) AS 'Path'
    FROM dbo.YourTableName t
    INNER JOIN RecursiveCTE cte ON t.ParentID = cte.ID
)
SELECT * FROM RecursiveCTE

Works fine in my case, no extra maintenance needed - but again: I cannot predict how the performance will be - try it!

marc_s