views:

98

answers:

1

The current structure is as follows:

Table RowType:    RowTypeID

Table RowSubType: RowSubTypeID
                  FK_RowTypeID

Table ColumnDef:  FK_RowTypeID
                  FK_RowSubTypeID (nullable)

In short, I'm mapping column definitions to rows. In some cases, those rows have subtype(s), which will have column definitions specific to them. Alternatively, I could hang those column definitions that are specific to subtypes off their own table, or I could combine the data in RowType and RowSubType into one table and work with a single ID, but I'm not sure either is a better solution (if anything, I'd lean towards the latter, as we mostly end up pulling ColumnDefs for a given RowType/RowSubType).

Is the current design SQL blasphemy?

If I keep the current structure, how do I maintain that if RowSubTypeID is specified in ColumnDef, that it must correspond to the RowType specified by RowTypeID? Should I try to enforce this with a trigger or am I missing a simple redesign that would solve the problem?

+4  A: 

What you're having trouble with is Fourth Normal Form.

Here's the solution:

Table RowSubType:       RowSubTypeID
                        FK_RowTypeID
                        UNIQUE(FK_RowTypeID, RowSubTypeID) 

Table ColumnDef:        ColumnDefID
                        FK_RowTypeID
                        UNIQUE(ColumnDefID, FK_RowTypeID) 

Table ColumnDefSubType: FK_ColumnDefID   } compound foreign key to ColumnDef
                        FK_RowTypeID     }   } 
                        FK_RowSubTypeID      } compound foreign key to RowSubType

You only need to create a row in the ColumnDefSubType table for columns that have a row subtype. But all references are constrained so you can't create an anomaly.

But for what it's worth, I agree with @Seth's comment about possible over-engineering. I'm not sure I understand how you're using these column defs and row types, but it smells like the Inner-Platform Effect anti-pattern. In SQL, just use metadata to define metadata. Don't try to use data to create a dynamic schema.

See also this excellent story: Bad CaRMa.


Re your comment: In your case I'd recommend using Class Table Inheritance or Concrete Table Inheritance. This means defining a separate table per subtype. But each column of your original text record would go into the respective column of the subtype table. That way you don't need to have your rowtype or rowsubtype tables, it's implicit by defining tables for each subtype. And you don't need your columndefs table, that's implicit by the columns defined in your tables.

See also my answer to Product table, many kinds of product, each product has many parameters or my presentation slides Practical Object-Oriented Models in SQL.

Bill Karwin
I'm not sure if this fits the situation we have...am I just not seeing it? We have rows of text data that these tables are mapping that may have subtypes. These subtypes are specified by a column in the actual row (ugh), so that a row of a particular type-subtype will have variable column data (number of columns, type of columns, start and stop position if fixed width, etc). I need to be able to retrieve all the column defs for a particular type-subtype combination (including null subtype for those row types that do not define a subtype)
Brian
The problem in defining a separate table per subtype is that it would break the flexibility, which is the entire reason for putting the mapping in tables. The goal is an import/processing engine that is data driven and flexible enough to be able to handle diverse sources (I probably should have mentioned this from the start) It makes more sense to me to combine the rowtype and rowsubtype tables into a single table, even though that is uglier to me than my original direction. Using a trigger to keep the initial design seems hacky, but I'm still tempted. Still, I'd rather find an alternative.
Brian
(Also, performance need not be considered)
Brian
You really need to read the *Bad CaRMa* article.
Bill Karwin
I did :-) Performance really isn't an issue in this case, as these will never be very large or hit very hard. And we really do need the flexibility in order to quickly bring new clients into the system.
Brian
Thanks for all the help. It is much appreciated. I think I'm going to go with the original design and use a trigger to enforce the relationship, ugly as that is. But in exploring the options you put forth, I learned a lot--quite a bit of which relates to other areas of the project. What is your opinion of Microsoft's SQL Data Services? From what I gleaned, it presents itself logically as EAV while implementing Concrete Table Inheritance. Best of both worlds?
Brian
From what I've read, Microsoft scrapped the first implementation of SDS which was EAV, when they discovered that customers really needed a relational database. They replaced it with a cloud-based version of Microsoft SQL Server, even down the protocol level.
Bill Karwin
Bummer. It would have been nice if they had done both.
Brian