views:

318

answers:

4

I've taken over an ASP.NET application that needs to be re-written. The core functionality of this application that I need to replicate modifies a SQL Server database that is accessed via ODBC from third party software.

The third-party application creates files that represent printer labels, generated by a user. These label files directly reference an ODBC source's fields. Each row of the table represents a product that populates the label's fields. (So, within these files are direct references to the column names of the table.)

The ASP.NET application allows the user to create/update the data for these fields that are referenced by the labels, by adding or editing a particular row representing a product.

It also allows the occasional addition of new fields... where it actually creates a new column in the core table that is referenced by the labels.

My concern: I've never programmatically altered an existing table's columns before. The existing application seems to handle this functionality fine, but before I blindly do the same thing in my new application, I'd like to know what sort of pitfalls exist in doing this, if any... and if there are any obvious alternatives.

+3  A: 

It can become problem when too many columns are added to tables, and you have to be careful if performance is a consideration (covering indexes are not applicable, so expensive bookmark lookups might be performed).

The other alternative is a Key-Value Pair structure: Key Value Pairs in Database design, but that too has it's pitfalls and you are better off creating new columns, as you are suggesting. (KVPs are good for settings)

Mitch Wheat
Yeap- here is a SO post on KV pairs- http://stackoverflow.com/questions/126271/key-value-pairs-in-relational-database
RichardOD
+1  A: 

One risk is the table getting too wide. I used to maintain a horrible app that added 3 columns "automagically" when new values were added to some XML (for some reason it thought everything would be a string a date or a number- hence the creation of 3 columns).

There are other techniques like serializing a BLOB or designing the tables differently that may help.

RichardOD
+2  A: 

One option I think is to use a KVP table for storing dynamic "columns" (as first mentioned by Mitch), join the products table with the KVP table based on the product id then pivot the results in order to have all the dynamic columns in the resultset.


EDIT: something along these lines:

Prepare:

create table Product(ProductID nvarchar(50))

insert Product values('Product1')
insert Product values('Product2')
insert Product values('Product3')

create table ProductKVP(ProductID nvarchar(50), [Key] nvarchar(50), [Value] nvarchar(255))

insert ProductKVP values('Product1', 'Key2', 'Value12')
insert ProductKVP values('Product2', 'Key1', 'Value21')
insert ProductKVP values('Product2', 'Key2', 'Value22')
insert ProductKVP values('Product2', 'Key3', 'Value23')
insert ProductKVP values('Product3', 'Key4', 'Value34')

Retrieve:

declare @forClause nvarchar(max),
        @sql nvarchar(max)

select @forClause = isnull(@forClause + ',', '') + '[' + [Key] + ']' from (
    select distinct [Key] from ProductKVP /* WHERE CLAUSE */
) t

set @forClause = 'for [Key] in (' + @forClause + ')'

set @sql = '
select * from (
select 
    ProductID, [Key], [Value]
from (
        select k.* from 
        Product p
        inner join ProductKVP k on (p.ProductID = k.ProductID)
        /* WHERE CLAUSE */
    ) sq
) t pivot (
    max([Value])' +
    @forClause + '
) pvt'

exec(@sql)

Results:

ProductID   Key1      Key2      Key3      Key4
----------- --------- --------- --------- -------
Product1    NULL      Value12   NULL      NULL
Product2    Value21   Value22   Value23   NULL
Product3    NULL      NULL      NULL      Value34
Sorin Comanescu
+1  A: 

It very much depends on the queries you want to run against those tables. The main disadvantage of KVP is that more complex queries can become very inefficient.

A "hybrid" approach of both might be interesting.

Store the values you want to query in dedicated columns and leave the rest in an XML blob (MS SQL has nice features to even query inside the XML) or alternatively in a KVP bag. Personally I really don't like KVPs in DBs because you cannot build application logic specific indixes anymore.

Just another approach would be not to model the specific columns at all. You create generic "custom attribute" tables like: Attribute1, Attribute2, Attribute3, Attribute4 (for the required data type etc...) You then add meta data to your database that describes what AttrX means for a specific type of printer label.

Again, it really depends on how you want to use that data in the end.

Alex