views:

57

answers:

2

Apologies for the less than ideal title; had a hard time coming up with something.

Three entities: Products, Clients and Client_Product_Authorization.

A Client has access to many Products.

A Product is accessed by many Clients.

A Client_Product_Authorization authorizes a Client to access a Product.

We have 100,000+ Products (virtual goods) and managing Client_Product_Authorizations becomes a very labor intensive task. So we have Products that is a container of other Products. Example:

Product X is a container for Products 1 through 2000.

So, by creating a Client_Product_Authorization granting a Client the Product X, we are indirectly providing for the client to access products 1 through 2000. Mind that product 1 might be contained in different container products (so, yes, it is a many-to-many self relationship).

Here is the entity level data model:

alt text

The advantage of this mechanism is that we can just change the composition of Product X (adding or removing other products) and it automatically adjusts the product list available to the clients authorized to Product X. Managing awarding access to our large product-base is a matter of selecting a few container products.

The disadvantage is that it now became harder (in terms of creating a SQL statement, because of the many-to-many self-relationship) to know what the Client is actually authorized to see (as in the individual non-container products). As in:

Product Z is a container for Product X and Product Y

Product X is a container for Products 1 through 2000

Product Y is a container for Products 2001 through 5000

What are the actual non-container products a client authorized to Product Z can see?

Products 1 through 2000 and 2001 through 5000.

I would like to make the list of non-container products a client is authorized for to be materialized in some way. So that questions like:

Should Client ABC be allowed to see Product 78?

OR

What products is Client ABC authorized to see?

can be easily responded with a query.

The goal is to make the job of software trying to determine the list of products accessible to a client a simple mechanism, instead of requiring a traversal through all container products, their sub-container products, etc etc etc.

Three questions:

a) Is there a better data-model for this scenario?

b) Is there a different mechanism to simplify the management of access authorization for this large set of products?

c) How would you go about making the obtention of the list of non-container products available to a client as simple as possible?

I appreciate the collective's input. Thanks in advance!


Update: Limiting the number of nested products is not an option for business reasons.

+3  A: 

Since you're using SQL 2005 you should have access to common table expressions (CTEs) which makes the recursion of finding the children of a product much easier. You might want to look into CTEs and see if that's sufficient for what you're doing.

Also, I don't recall this specific scenario and my copy of the book is at home, but Joe Celko wrote a very good book on modeling hierarchies and trees in an RDBMS. It's probably worth looking into to see if there is a better model for this. He had a few rather ingenious ones for other scenarios that didn't seem obvious at first, but which are very efficient. Even if there isn't a direct match, some of the techniques which he uses might be useful.

The model which you have is what's referred to as the adjacency list model. Celko also shows how to model hierarchies using what are called the nested set model and the path enumeration model.

The nested set model may seem a little complicated at first, but it's actually simple in a way. It's more expensive for updating, but selects from it are VERY fast compared to just about any other way to model hierarchies. You can find an abbreviated description of it here. Since a product can be contained in multiple trees you would have to adapt it slightly for your case.

The path enumeration model basically just uses a delimited (or XML) string to list out the path to the row in question, starting at the root of the tree. You then use string (or XQuery) functions to find children of a parent, etc. As far as I know, it's only really useful for trees, which have a single root, so I don't think that you could use it in your case.

Tom H.
For 2005, this is pretty much the way you have to model hierarchies. For 2008, they added the hierarchyid datatype, which rocks and radically simplifies working with hierarchial data.
Chris Lively
It's definitely not the way that you HAVE to model hierarchies. See the book mentioned above. The nested set hierarchy model has started to get a lot of use in place of the usual linked list model.
Tom H.
Tom, +1! Please modify your answer to include this reference to linked list models vs. nested set models. Just naming the two terms makes a world of difference to people looking at the question and your answer.
celopes
+1  A: 

From a purely relational perspective (and my own personal perspective), what you have is the most logical means of representation. The downside, of course, is that SQL isn't terribly well-suited for recursion. There are other paradigms out there for storing tree structures (this is a long-standing issue in RDBMS discussions), but I won't get into them because I generally feel like they sacrifice readability and maintainability in favor of ease of querying. That may be what you need, but I don't know.

Since you're running 2005, you can do recursion pretty easily with common table expressions (CTE's; see this MSDN article). Otherwise, you either have to turn it into a multi-step procedure (a stored procedure is probably a good candidate, since you can hide the implementation complexity from the calling code) or limit the number of levels you're willing to look (ie, products can only be nested 3 levels deep) by adding left joins for each level.

Adam Robinson