I have parent child data in excel which gets loaded into a 3rd party system running MS SQL server. The data represents a directed (hopefully) acyclic graph. 3rd party means I don't have a completely free hand in the schema. The excel data is a concatenation of other files and the possibility exists that in the cross-references between the various files someone has caused a loop - i.e. X is a child of Y (X->Y) then elsewhere (Y->A->B-X). I can write vb, vba etc on the excel or on the SQL server db. The excel file is almost 30k rows so I'm worried about a combinatorial explosion as the data is set to grow. So some of the techniques like creating a table with all the paths might be pretty unwieldy. I'm thinking of simply writing a program that, for each root, does a tree traversal to each leaf and if the depth gets greater than some nominal value flags it.
Better suggestions or pointers to previous discussion welcomed.
views:
67answers:
1
+3
A:
You can use a recursive CTE to detect loops:
with prev as (
select RowId, 1 AS GenerationsRemoved
from YourTable
union all
select RowId, prev.GenerationsRemoved + 1
from prev
inner join YourTable on prev.RowId = ParentRowId
and prev.GenerationsRemoved < 55
)
select *
from prev
where GenerationsRemoved > 50
This does require you to specify a maximum recursion level: in this case the CTE runs to 55, and it selects as erroneous rows with more than 50 children.
Andomar
2010-07-12 16:28:48
This is pretty much what I did too. It works well.
Gabriel McAdams
2010-07-12 16:29:56
I'd never heard of CTEs as my DB experience is largely zSeries DB/2. Thank you for the pointer to them. I think I now have the answer to a number of other questions.
2010-07-13 08:56:21
And found a decent tutorial here http://msdn.microsoft.com/en-us/library/ms186243.aspxThanks guys
2010-07-13 10:20:18
Though since the CTE has been running for 3 hours now I think I need to consider alternatives. :-)
2010-07-13 15:36:55
@wudang: Do you have an index on the ParentRowId field? Also turn on "Show Actual Execution Plan" and see if it suggests any indexes.
Andomar
2010-07-13 18:23:43
@Andomar - yes both the Parent and Child columns are indexed. I think I'll just have to write some code to load the data in memory as a DAG and traverse it building a path from each leaf and checking each discovered node does not already exist in the path.
2010-07-14 11:22:20