views:

52

answers:

1

I have a table that defines a heirarchy:

Create Table [example] (
    id          Integer   Not Null Primary Key,
    parentID    Integer       Null,
    largeData1  nVarChar(max) Null,
    largeData2  nVarChar(max) Null);
    -- largeData3...n also exist

Insert Into [example] (id, parentID, largeData1, largeData2)
Select 1, null, 'blah blah blah', null          Union
Select 2,    1, null,             null          Union
Select 3,    1, 'foo bar foobar', null          Union
Select 4,    3, null,             'lorem ipsum' Union
Select 5,    4, null,             null;

Hierarchy diagram for this data:

Hierarchy diagram

I want to write a query that will return a single row for any given [id] value. The row should contain that row's [id] and [parentID] information. It should also contain the [largeData1...n] fields. However, if a largeData field is null, then it should traverse up the hierarchy until a non-null value for that field is encountered. It should, in short, function like the coalesce function, except across a hierarchy of rows instead of a set of columns.

Example:

Where [id] = 1:

id:          1
parentID:    null
largeData1:  blah blah blah
largeData2:  null

Where [id] = 2

id:          1
parentID:    1
largeData1:  blah blah blah
largeData2:  null

Where [id] = 3

id:          3
parentID:    1
largeData1:  foo bar foobar
largeData2:  null

Where [id] = 4

id:          4
parentID:    3
largeData1:  foo bar foobar
largeData2:  lorem ipsum

Where [id] = 5

id:          5
parentID:    4
largeData1:  foo bar foobar
largeData2:  lorem ipsum

So far, I have this:

Declare @id Integer; Set @id = 5;

With heirarchy
    (id, parentID, largeData1, largeData2, [level])
As (
    Select id, parentID, largeData1,
           largeData2, 1 As [level]
    From example
    Where id = @id

    Union All

    Select parent.id, parent.parentID,
           parent.largeData1,
           parent.largeData2,
           child.[level] + 1 As [level]
    From example As parent
    Inner Join heirarchy As child
        On parent.id = child.parentID)

Select id, parentID,
   (Select top 1 largeData1
    From heirarchy
    Where largeData1 Is Not Null
    Order By [level] Asc) As largeData1,

   (Select top 1 largeData2
    From heirarchy
    Where largeData2 Is Not Null
    Order By [level] Asc) As largeData2

From example
Where [id] = @id;

This returns the results that I am looking for. However, according to the query plan, it is making a separate pass through the hierarchy for each largeData field that I pull back.

How can I make this more efficient?

This is obviously a simplified version of a more complex problem. The final query will return data in XML format, so any solutions involving the FOR XML clause are perfectly fine.

I can create a CLR aggregate function for this, if doing so would help. I have not yet explored that route.

+4  A: 

I came up with this:

DECLARE @Id  int

SET @Id = 5


;WITH cte (Id, ParentId, SaveParentId, LargeData1, LargeData2)
 as (--  The "anchor", your target Id
     select
        ex.Id
       ,ex.ParentId
       ,ex.ParentId  SaveParentId  --  Not changed throughout the CTE
       ,ex.LargeData1
       ,ex.LargeData2
      from Example ex
      where ex.Id = @Id
     union all select
                 cte.Id
                ,ex.ParentId
                ,cte.SaveParentId  --  Not changed throughout the CTE
                 --  These next are only "reset" if they are null and a not-null
                 --  value was found at this level 
                ,isnull(ex.LargeData1, cte.LargeData2)  
                ,isnull(ex.LargeData2, cte.LargeData2)
      from Example ex
       inner join cte
        on cte.ParentId = ex.Id)
 select
   Id
  ,SaveParentId     ParentId
  ,max(LargeData1)  LargeData1
  ,max(LargeData2)  LargeData2
 from cte
 group by Id, SaveParentId

Basically, start at your target node and walk up the tree, replacing your null columns with not-null values if and when they are found.

(Sorry, but I don't do XML on weekends.)

Philip Kelley
+1 for pushing up non-null values. But using MAX may be problematic. If row 3 of the sample data says "afoo bar bar" instead of "foo bar bar", the query for @id=5 will return "blah blah blah" for largeData1.
8kb
When "walking up" within the CTE, if at a given level a column's value is null, it is replaced with the value at that level, otherwise it is left unchanged. One row is produced per level. Thus, when the cte is done, the value for a column across all the rows will be either null or the first value encountered. Aggregations ignore nulls, leaving only the one value for max (or min) to select.
Philip Kelley