tags:

views:

44

answers:

4

Lets say you have a relational DB table like INVENTORY_ITEM. It's generic in the sense that anything that's in inventory needs a record here. Now lets say there are tons of different types of inventory and each different type might have unique fields that they want to keep track of (e.g. forks might track the number of tines, but refrigerators wouldn't have a use for that field). These fields must be user-definable per category type.

There are many ways to solve this:

  1. Use ALTER TABLE statements to actually add nullable columns on the fly (yuk)
  2. Have two tables with a one-to-one mapping, INVENTORY_ITEM, and INVENTORY_ITEM_USER, and use ALTER TABLE statements to add and remove nullable columns from the latter table on the fly (a bit nicer).
  3. Add a CUSTOM_PROPERTY table, and a CUSTOM_PROPERTY_VALUE table, and add/remove rows in CUSTOM_PROPERTY when the user adds and removes rows, and store the values in the latter table. This is nice and generic, but the performance would suffer. If you had an average of 20 values per item, the number of rows in CUSTOM_PROPERTY_VALUE goes up at 20 times the rate, and you still need to include columns in CUSTOM_PROPERTY_VALUE for every different data type that you might want to store.
  4. Have one big varchar(MAX) field on INVENTORY_ITEM to store custom properties as XML.
  5. I guess you could have individual tables for each category type that hangs off the INVENTORY_ITEM table, and these get created/destroyed on the fly when the user creates inventory types, and the columns get updated when they add/remove properties to those types. Seems messy though.

Is there a best-practice for this? It seems to me that option 4 is clean, but doesn't allow you to easily search by the metadata. I've used a variant of 3 before, but only on a table that had a really small number of rows, so performance wasn't an issue. It always seemed to me that 2 was a good idea, but it doesn't fit well with auto-generated entity frameworks, so you'd have to exclude the custom properties table from the entity generation and just write your own custom data access code to handle it.

Am I missing any alternatives? Is there a way for SQL server to "look into" XML data in a column so it could actually do stuff with option 4 now?

+2  A: 

Definitely the 3.

Sometimes 4 if you have a very good reason to do so.

Do not ever dynamically modify database structure to accommodate for incoming data. One day something could break and damage your database. It is simply not done this way.

Developer Art
FYI, it's my understanding that the Deltek ERP system uses a variant of option 2 to store user-defined columns.
Scott Whitlock
Well if someone does it the wrong way does not automatically mean it is a good thing to do as well :-)
Yves M.
Sure, but in that case it allows them to index certain columns for searching. Using option 3 means you'd have to index all "varchar" type columns together, so if you're searching for "model 1234" you'll be searching all varchar columns, not just the model number column for refrigerators (to be fair, your joins should narrow it down). Option 2 also gives you the ability to have foreign key columns to other entities, which you just can't do with option 3 (or 4, as far as I can tell).
Scott Whitlock
For Option 4 you can use xpath to query and join information stored as xml. which is just pretty cool.
Yves M.
@Yves I've never found xpath and xml cool in TSQL.
Chuck Conway
@Chuck Conway: well this is a bit like smoking in a car. some like it others don't...
Yves M.
@Yves In general, people agree smoking is bad for you...
Chuck Conway
@Chuck Conway: Argh. Got me.
Yves M.
+3  A: 

I am using the xml type column for this kind of situations...

http://msdn.microsoft.com/en-us/library/ms189887.aspx

Before xml we had to use the option 3. Which in my point of view is still a good way to do it. Espacialy if you have a Data Access Layer that is able to handle the type conversion properly for you. We stored everything as string values and defined a column that held the orignial data type for the conversion.

Options 1 and 2 are a no-go. Don't change the database schema in production on the fly.

Option 5 could be done in a separate database... But still no control over the schema and the user would need the rights to create tables etc.

Yves M.
Useful information. Here's an article about the XML datatype with examples: http://www.developer.com/db/article.php/3531196/The-Fundamentals-of-the-SQL-Server-2005-XML-Datatype.htm
Scott Whitlock
On a related note, if you are using LINQ to SQL as the DAL, then LINQ can't query the XML datatype directly, but you can encapsulate the XML query into a UDF, and LINQ to SQL supports UDF's: http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx That means you could return custom columns from a LINQ expression, and filter or select on them. I believe you'd just pass in the custom column name (string) as a parameter to the UDF.
Scott Whitlock
+2  A: 

3 or 4 are the only ones I would consider - you don't want to be changing the schema on the fly, especially if you're using some kind of mapping layer.

I've generally gone with option 3. As a bit of sanity, I always have a type column in the CUSTOM_PROPERTY table, which is repeated in the CUSTOM_PROPERTY_VALUE table. By adding a superkey to the CUSTOM_PROPERTY table of <Primary Key, Type>, you can then have a foreign key that references this (as well as the simpler foreign key to just the primary key). And finally, a check constraint that ensures that only the relevant column in CUSTOM_PROPERTY_VALUE is not null, based on this type column.

In this way, you know that if someone has defined a CUSTOM_PROPERTY, say, Tine count, of type int, that you're actually only ever going to find an int stored in the CUSTOM_PROPERTY_VALUE table, for all instances of this property.

Edit

If you need it to reference multiple entity tables, then it can get more complex, especially if you want full referential integrity. For instance (with two distinct entity types in the database):

    create table dbo.Entities (
        EntityID uniqueidentifier not null,
        EntityType varchar(10) not null,
        constraint PK_Entities PRIMARY KEY (EntityID),
        constraint CK_Entities_KnownTypes CHECK (
            EntityType in ('Foo','Bar')),
        constraint UQ_Entities_KnownTypes UNIQUE (EntityID,EntityType)
    )
    go
    create table dbo.Foos (
        EntityID uniqueidentifier not null,
        EntityType as CAST('Foo' as varchar(10)) persisted,
        FooFixedProperty1 int not null,
        FooFixedProperty2 varchar(150) not null,
        constraint PK_Foos PRIMARY KEY (EntityID),
        constraint FK_Foos_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
        constraint FK_Foos_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
    )
    go
    create table dbo.Bars (
        EntityID uniqueidentifier not null,
        EntityType as CAST('Bar' as varchar(10)) persisted,
        BarFixedProperty1 float not null,
        BarFixedProperty2 int not null,
        constraint PK_Bars PRIMARY KEY (EntityID),
        constraint FK_Bars_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
        constraint FK_Bars_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
    )
    go
    create table dbo.ExtendedProperties (
        PropertyID uniqueidentifier not null,
        PropertyName varchar(100) not null,
        PropertyType int not null,
        constraint PK_ExtendedProperties PRIMARY KEY (PropertyID),
        constraint CK_ExtendedProperties CHECK (
            PropertyType between 1 and 4), --Or make type a varchar, and change check to IN('int', 'float'), etc
        constraint UQ_ExtendedProperty_Names UNIQUE (PropertyName),
        constraint UQ_ExtendedProperties_Types UNIQUE (PropertyID,PropertyType)
    )
    go
    create table dbo.PropertyValues (
        EntityID uniqueidentifier not null,
        PropertyID uniqueidentifier not null,
        PropertyType int not null,
        IntValue int null,
        FloatValue float null,
        DecimalValue decimal(15,2) null,
        CharValue varchar(max) null,
        EntityType varchar(10) not null,
        constraint PK_PropertyValues PRIMARY KEY (EntityID,PropertyID),
        constraint FK_PropertyValues_ExtendedProperties FOREIGN KEY (PropertyID) references dbo.ExtendedProperties (PropertyID) on delete cascade,
        constraint FK_PropertyValues_ExtendedProperty_Types FOREIGN KEY (PropertyID,PropertyType) references dbo.ExtendedProperties (PropertyID,PropertyType),
        constraint FK_PropertyValues_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
        constraint FK_PropertyValues_Entitiy_Types FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType),
        constraint CK_PropertyValues_OfType CHECK (
            (IntValue is null or PropertyType = 1) and
            (FloatValue is null or PropertyType = 2) and
            (DecimalValue is null or PropertyType = 3) and
            (CharValue is null or PropertyType = 4)),
        --Shoot for bonus points
        FooID as CASE WHEN EntityType='Foo' THEN EntityID END persisted,
        constraint FK_PropertyValues_Foos FOREIGN KEY (FooID) references dbo.Foos (EntityID),
        BarID as CASE WHEN EntityType='Bar' THEN EntityID END persisted,
        constraint FK_PropertyValues_Bars FOREIGN KEY (BarID) references dbo.Bars (EntityID)
    )
    go
    --Now we wrap up inserts into the Foos, Bars and PropertyValues tables as either Stored Procs, or instead of triggers
    --To get the proper additional columns and/or base tables populated
Damien_The_Unbeliever
Lets say *all* the tables in your database use uniqueidentifier columns as primary keys. Could you generalize the CUSTOM_PROPERTY_x tables to refer to *any* table in the database, since we can safely assume that all primary keys in the database are unique? Anyone done that?
Scott Whitlock
Are you asking about, where you've commented elsewhere, that you want some properties to be foreign keys? Or are you meaning that you've got multiple tables that you wish to extend with additional properties, and you wish to place all of the additional properties in the same CUSTOM_PROPERTY_VALUE table?
Damien_The_Unbeliever
@Damien_The_Unbeliever: The latter. Example: a general NOTES table where you want the user to be able to attach notes to anything in the database, and handle it all in a contained module. Any entity can pass in a request to the notes module and just has to give its primary key as a parameter.
Scott Whitlock
Nice code example. A slightly modified version could work very well for a "tagging" implementation (where you want the user to be able to "tag" different entity types with tags that could apply to all of them).
Scott Whitlock
Yeah. In short, I love computed columns, and using Super Keys to enforce stricter relationships than the plain foreign keys would give you.
Damien_The_Unbeliever
A: 

My inclination would be to store things as XML if the database supports that nicely, or else have a small number of different tables for different data types (try to format data so it will fit one of a small number of types--don't use one table for VARCHAR(15), another for VARCHAR(20), etc.) Something like #5, but with all tables pre-created, and everything shoehorned into the existing tables. Each row should hold a main-record ID, record-type indicator, and a piece of data. Set up an index based on record-type, subsorted by data, and it will be possible to query for particular field values (where RecType==19 and Data=='Fred'). Querying for records that match multiple field values would be harder, but such is life.

supercat