views:

555

answers:

5

I'm trying to implement a tree like structure using a Materialized Path model described here: http://www.dbazine.com/oracle/or-articles/tropashko4.

Is it possible to enforce referential integrity on the [path] field? I don't see how SQL could do it, do I have to do it manually in the DAL?

+2  A: 

Yes, you have to enforce data integrity yourself in the DAL when you use either Materialized Path or Nested Sets solutions for hierarchical data.

Adjacency List supports referential integrity, and this is true also for a design I call "Closure Table" (Tropashko calls this design "transitive closure relation").

Bill Karwin
Thanks for the info, Bill.
hyperslug
Here's an article on a transitive closure solution: http://www.codeproject.com/KB/database/Modeling_DAGs_on_SQL_DBs.aspx @Bill Karwin: how big datasets have you deployed using "closure tables" in OLTP/interactive scenarios without hitting performance problems for inserts?
nawroth
@nawroth: Only on the order of thousands of nodes in the tree, so good point. If you need to represent very deep trees, you end up with a lot of rows. If you need to represent many shallow trees, it's more modest.
Bill Karwin
New information: I did a proof of concept with a tree of 490,000 nodes and I can fetch the list of ancestors of any node in 0.0006sec, and a whole tree with per-level expansion in 0.3 seconds. http://www.slideshare.net/billkarwin/models-for-hierarchical-data
Bill Karwin
@Bill, thanks for remembering this thread. I studied the approaches a bit more and realized I like closure table the best. It makes sense in my scenario that users inserting data will not notice the overhead, and it enables large, fast retrieves.
hyperslug
+1  A: 

In the materialized path model you can use arbitrary strings (maybe unicode strings to allow more than 256 children) instead of special strings of form "x.y.z". The id of the parent is then the id of the direct children with the last character removed. You can easily enforce this with a check constraint like (my example works in PostgreSQL)

check(parent_id = substring(id from 1 for char_length(id)-1)),

within your create table command. If you insist on strings of form "x.y.z", you'll have to play around with regular expressions, but I'd guess it's possible to find a corresponding check constraint.

Whoever
If you want to enforce that roots have char_length(id)=1, you can additionally add the constraint check((parent_id is null) or (char_length(id)=1)) to the table definition.
Whoever
+3  A: 

"Materialized path" as presented by Vadim Tropashko in that article, introduces the notion of order into a relation ("Jones is the second member".).

"Materialized path" is nothing but "some form of materialized view" on the transitive closure, and therefore suffers all and exactly the same problems as any other "materialized view", except that matters are algorithmically worse precisely because of the involvement of a closure.

SQL is almost completely powerless when constraints-on-a-closure are in play. (Meaning : yes, SQL requires you to do everything yourself.) It's one of those areas where the RM shows the maximum of its almost unlimited power, but SQL fails abysmally, and where it is such a shame that the majority of people mistake SQL for being relational.

(@Bill Karwin : I'd like to be able to give you +1 for your remark on the relation between the depth of the trees and the result on performance. There are no known algorithms to compute closures that perform well in the case of trees with "crazy" depths. It's an algorithmic problem, not an SQL nor a relational one.)

EDIT

Yes, RM = Relational Model

RM = relational model?
hyperslug
+1 Yes, the materialized path is an example of denormalization. It has some benefits of efficiency for certain types of queries, but it sacrifices benefits of the RM such as referential integrity.
Bill Karwin
Btw, you can hover over an invisible upvote arrow to the left of a comment, and give the comment a little endorsement. That doesn't give rep points, though.
Bill Karwin
A: 

Yes, we can "enforce referential integrity on the [path] field". I did that a couple of years ago, described here:

Store your configuration settings as a hierarchy in a database

AlexKuznetsov
A: 

When using Materialized path encoding child records refer to parent in some kind of bizarre referential integrity constraint with encoding of the parent being substring of that of child. "SQL Design Patterns, Expert Guide to SQL Programming" book reveals what is really going on. Materialized path wrapped into a proper mathematical shape is matrix encoding. A child matrix left column explicitly refers to the parent right column! The matrix encoding encompasses everything: it is materialized path, nested intervals, and adjacency list altogether.

Tegiri Nenashi