ansaurus

Question

Creating a flattened table/view of a hierarchically-defined set of data

Answer 1

+1 A:

So what you want is to materialize the transitive closures. That is, given this application table ...

 ID   | PARENT_ID
------+----------
    1 | 
    2 |         1
    3 |         2
    4 |         2
    5 |         4

... the graph table would look like this:

 PARENT_ID | CHILD_ID
-----------+----------
         1 |        2
         1 |        3
         1 |        4
         1 |        5
         2 |        3
         2 |        4
         2 |        5
         4 |        5

It is possible to maintain a table like this in Oracle, although you will need to roll your own framework for it. The question is whether it is worth the overhead. If the source table is volatile then keeping the graph data fresh may cost more cycles than you will save on the queries. Only you know your data's profile.

I don't think you can maintain such a graph table with CONNECT BY queries and cascading foreign keys. Too much indirect activity, too hard to get right. Also a materialized view is out, because we cannot write a SQL query which will zap the 1->5 record when we delete the source record for ID=4.

So what I suggest you read a paper called Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su and Wong. This contains a lot of theory and some gnarly (Oracle) SQL but it will give you the grounding to build the PL/SQL you need to maintain a graph table.

"can you expand on the part about it being too difficult to maintain with CONNECT BY/cascading FKs? If I control access to the table and all inserts/updates/deletes take place via stored procedures, what kinds of scenarios are there where this would break down?"

Consider the record 1->5 which is a short-circuit of 1->2->4->5. Now what happens if, as I said before, we delete the the source record for ID=4? Cascading foreign keys could delete the entries for 2->4 and 4->5. But that leaves 1->5 (and indeed 2->5) in the graph table although they no longer represent a valid edge in the graph.

What might work (I think, I haven't done it) would be to use an additional synthetic key in the source table, like this.

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
    1 |           | AAA
    2 |         1 | BBB
    3 |         2 | CCC
    4 |         2 | DDD
    5 |         4 | EEE

Now the graph table would look like this:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
         1 |        2 | BBB
         1 |        3 | CCC
         1 |        4 | DDD
         1 |        5 | DDD
         2 |        3 | CCC
         2 |        4 | DDD
         2 |        5 | DDD
         4 |        5 | DDD

So the graph table has a foreign key referencing the relationship in the source table which generated it, rather than linking to the ID. Then deleting the record for ID=4 would cascade deletes of all records in the graph table where NEW_KEY=DDD.

This would work if any given ID can only have zero or one parent IDs. But it won't work if it is permissible for this to happen:

 ID   | PARENT_ID
------+----------
    5 |         2
    5 |         4

In other words the edge 1->5 represents both 1->2->4->5 and 1->2->5. So, what might work depends on the complexity of your data.

APC 2010-08-03 12:38:21

@APC: I have two tables I would like to apply this to -- one is volatile, the other is updated only once during off-hours by a single-threaded process. My hope was to implement it on the less volatile table first and then port the framework over to the other table once I worked all the kinks out.I haven't absorbed everything from the paper you linked to yet, but it does seem to be exactly what I'm looking for.

RenderIn 2010-08-03 14:33:57

@APC: I'm not questioning your judgment, but can you expand on the part about it being too difficult to maintain with CONNECT BY/cascading FKs? If I control access to the table and all inserts/updates/deletes take place via stored procedures, what kinds of scenarios are there where this would break down? Even if others had direct insert/update/delete access, if the logic were implemented in triggers/cascading FKs, what's an example of some problems I would face?

RenderIn 2010-08-03 14:35:07

ansaurus

tags:

views:

answers:

Creating a flattened table/view of a hierarchically-defined set of data

related questions