views:

67

answers:

1

I have parent child data in excel which gets loaded into a 3rd party system running MS SQL server. The data represents a directed (hopefully) acyclic graph. 3rd party means I don't have a completely free hand in the schema. The excel data is a concatenation of other files and the possibility exists that in the cross-references between the various files someone has caused a loop - i.e. X is a child of Y (X->Y) then elsewhere (Y->A->B-X). I can write vb, vba etc on the excel or on the SQL server db. The excel file is almost 30k rows so I'm worried about a combinatorial explosion as the data is set to grow. So some of the techniques like creating a table with all the paths might be pretty unwieldy. I'm thinking of simply writing a program that, for each root, does a tree traversal to each leaf and if the depth gets greater than some nominal value flags it.
Better suggestions or pointers to previous discussion welcomed.

+3  A: 

You can use a recursive CTE to detect loops:

with prev as (
    select RowId, 1 AS GenerationsRemoved
    from YourTable
    union all
    select RowId, prev.GenerationsRemoved + 1
    from prev
    inner join YourTable on prev.RowId = ParentRowId
    and prev.GenerationsRemoved < 55
)
select * 
from prev
where GenerationsRemoved > 50

This does require you to specify a maximum recursion level: in this case the CTE runs to 55, and it selects as erroneous rows with more than 50 children.

Andomar
This is pretty much what I did too. It works well.
Gabriel McAdams
I'd never heard of CTEs as my DB experience is largely zSeries DB/2. Thank you for the pointer to them. I think I now have the answer to a number of other questions.
And found a decent tutorial here http://msdn.microsoft.com/en-us/library/ms186243.aspxThanks guys
Though since the CTE has been running for 3 hours now I think I need to consider alternatives. :-)
@wudang: Do you have an index on the ParentRowId field? Also turn on "Show Actual Execution Plan" and see if it suggests any indexes.
Andomar
@Andomar - yes both the Parent and Child columns are indexed. I think I'll just have to write some code to load the data in memory as a DAG and traverse it building a path from each leaf and checking each discovered node does not already exist in the path.