tags:

views:

50

answers:

2

I'm building a proof of concept schema for a product catalog to possibly replace a very aging and crufty one we use.

In our business, we sell both physical materials and services (one time and reoccurring charges).

The current catalog schema has each distinct category broken out into individual tables, while this is nicely normalized and performs well, it is fairly difficult to extend. Adding a new attribute to a particular product involves changing the table schema and backpopulating old data.

An idea I've been toying with has been something along the line of a base set of entity tables in 3rd normal form, these will contain the facts that are common among ALL products.

Then, I'd like to build an Entity-Attribute-Value schema that allows each entity type to be extended in a flexible way using just data and no schema changes.

Finally, I'd like to denormalize this data model into materialized views for each individual entity type. These views are what the application would access.

We also have many tables that contain business rules and compatibility rules. These would join against the base entity tables instead of the views.

My big concerns here are:

  • Performance - Attribute-Entity-Value schemas are flexible, but typically perform poorly, should I be concerned?
  • More Performance - Denormalizing using materialized views may have some risks, I'm not positive on this yet.
  • Complexity - While this schema is flexible and maintainable using just data, I worry that the complexity of the design might make future schema changes difficult.

For those who have designed product catalogs for large scale enterprises, am I going down the totally wrong path? Is there any good best practice schema design reading available for product catalogs?

+2  A: 

Having a single generic PRODUCTS table is a good idea but I am not sure what you would gain from using EAV for the specific lines. There is no additional flexibility in your solution: if you add a new Product you need a new materialized view; if you add a new attribute to an existing Product you need to re-define and rebuild the materialized view, which is really no different from back-populating data (except it will probably take longer).

I don't think the performance and scalability issues of the EAV model apply in this scenario, as the application will be hitting the materialized views (except for the aforementioned rebuilding). Similarly, the performance hit of denormalization doesn't apply with this usage of materialized views because you will only be selecting from them.

For me the kicker is the complexity: mixing EAV and properly defined 3NF tables is confusing. Plus you lose the ability to enforce relational integrity rules and constraints in the database for all your specific products.

I think a better solution would be to keep your proposed PRODUCTS table and use sub-type tables for individual lines. Once you get down to modelling you may find that you need a layer of tables for different types of product - SERVICE_PRODUCTS and THING_PRODUCTS - and relatively few specific Products need their own table. You could use regular views to present "denormalized" projections of each Product.

APC
I should clarify. The denormalized views would not flip EAV into columns but into rows, so they would grow with the data. The EAV schema would not be used for each product, but instead to define product attributes, for example, "Weight" would be an attribute that applies to physical goods, but not to services, so it would be an attribute.
FlySwat
+2  A: 

I've worked with a similar problem only with legal contracts in a multi-tenant application instead of products. Some attributes are common to all and some are arbitrary attributes meant for tracking.

I've said many times that EAVs are like drugs: in small quantities and used in the right circumstances, they can beneficial; too much will kill you. Generally, the rule must be that no query along the lines of "Where Attribute = 'Foo'" or "Where AttributeValue = 'Bar'" can be allowed to be written. No report, no display; nothing. Any use of the EAV must be entirely dynamic and not depend on the existence of a specific attribute row value. In this light, the EAV acts like a bag of data. It can be spit out in a list on a report. You can allow users to choose attributes they wish to see on a report. What you cannot do is filter for specific EAV values. I've heard people use columns with XML data to solve the same problem. The moment someone does want to filter for a specific attribute, that is the trigger to make that attribute a first class attribute as a column.

What makes this compromise solution difficult is enforcement. It is tantalizingly simple to write a query where you filter for a specific attribute. Report designers will be tempted to do just that after they get pressure from management. It requires a firm hand to enforce discipline in the developers that they never filter for a specific attribute. If that kind of discipline cannot be achieved, then I wouldn't recommend making an EAV structure an option.

Thomas