views:

94

answers:

5

I'm looking for some ideas on methods for persisting customer configurable data in a relational database (SQL Server 2000 in my case).

For example let's say you have a standard order entry application where your customer enters a product they want to buy. In addition to the fields that are important to both you and the customer (itemID, cost etc), you want to allow the client to enter information only relevant to them and persist it for them (for later retrieval on reports or invoices or whatever). You also want the labeling of these "customer fields" to be configured by the customer. So one customer might have a field called "Invoice Number" another customer might have 2 fields called "Invoice#" and "Invoice Date" etc...

I can think of a few ways to do this. You could have a customerfields table with some reasonable number of varchar fields related to each transaction and then another table clientcustomerfields which contains the meta data about how many fields a customer uses, what the field names are etc. Alternatively you could use XML to persist the customer data so you don't have to worry about filly up X # of fields, you'd still need some table to describe the customers meta data (maybe through an XSD).

Are there any standard ways of doing this type of thing?

+1  A: 

As a generalisation I would recommend against using opaque XML blobs to store field-oriented data in a relational database.

  1. The best solution is to have some 'user' fields and configuration within the application to set up how these fields are used and presented. If the fields are varchars the overhead for empty fields is fairly minimal, IIRC about 1 byte per field. Although this looks inelegant and has a finite number of fields, it is the simplest to query and populate which makes it the fastest. One option would be to make the configuration agnostic to the number of fields and simply run a script to add a few more fields if you need them.

  2. Another option is to have a 'coding' table hanging off entities which user-configurable fields. It has 'entity ID', 'field type' and 'field code' columns where the 'field type' column denotes the actual content. The particular disadvantage is that it makes queries slower as they have to potentially join against this table multiple times.

I've actually seen both (1) and (2) in use on the same system. The vendor originally started with (2) but then found it to be a pain in the arse and subsequent subsystems on the application went to using (1). This change in approach was borne out of bitter experience.

The principal strike against XML blobs is that they are not first class citizens in the database schema. The DBMS cannot enforce referential integrity on the blob by itself, it cannot index individual columns within the blob and querying the data from the blob is more complex and may not be supported by reporting tools. In addition, the content of the blob is completely opaque to the system data dictionary. Anyone trying to extract the data back out of the system is dependent on the application's documentation to get any insight into the contents.

ConcernedOfTunbridgeWells
Once the Op upgrades to a new version of SQL Server the XML can then be indexed with XML Indexes. There's nothing wrong with storing data that you don't ever need to search against as an XML data if that's the easiest way for the app to deal with it. I've got several places when I store XML data as a value in a field, because the website needs to be able to store a variety of customer settings data, but we don't want to redesign the tables each time the site needs to add another item (or have a named value pairs table). Since the database doesn't care about the value, storing as XML works.
mrdenny
People tend to forget that application developers aren't the only stakeholders in a database. Just because 'I' don't need to search an opaque field doesn't mean that no-one needs to search it.
ConcernedOfTunbridgeWells
A: 

In addition to your own suggestions, another way is to look at the Profile Provider system in ASP.net (Assuming a MS tech stack on this). You can extend the ProfileBase to include 2 arrays representing your user defined keys and another for the corresponding values. At that point, the SqlProfileProvider will handle the storage and retrieval of such and create your implementation of the ProfileBase object. Ultimately, this would be similar to if you were trying to use the ProfileProvider system in a Web Application project and not a Web Site project (which use different build managers).

JamesEggers
+2  A: 

One strategy I'd use:

customer_fields
- field_id
- cusomer_id
- field_name

customer_transaction_field_values
- transaction_id
- field_id
- field_value

Matt Wrock
Having a table that has the a FK to the customer and then these fields would be a good, database only solution and fairly easy to sort and implement if/as needed.
JamesEggers
the fk to the customers table would probably best be in the transaction table
Matt Wrock
I'm aware of at least one product that started out using this type of approach and then abandoned it because it doesn't work as well at you might think. The field_values table gets quite large and queries often involve joining against the table multiple times. Putting a clustered index on the table mitigates the worst of the performance drain but the table is still relatively fiddly to query.
ConcernedOfTunbridgeWells
A: 

I have done this in the past and used the concept of user defined fields. I would create four tables for the basic types:

  1. UDFCharacter - id - int, order_id - int, value - varchar(4000)
  2. UDFNumber - id - int, order_id - int, value - float
  3. UDFDateTime - id - int, order_id - int, value - datetime
  4. UDFText - id - int, order_id - int, value - text

I would then have a table that described the fields along with their type:

CustomField - id - int, customer_id - int (linked to customer table), fieldType - 'UDFCharacter, UDFNumber, etc', name - varchar, and other meta info

The responses to the fields go in the UDF tables. The fields get displayed on the page based on the CustomField table. Our system was may more complex and required more tbales but this seems like it would work.

efleming
to get he benefit of strong(er) typing you can also implement this approach using the SQL_Variant data type in Sql Serverhttp://msdn.microsoft.com/en-us/library/ms173829.aspx
JasonHorner
+1  A: 

You should read Best Practices for Semantic Data Modeling for Performance and Scalability. This is exactly the question addressed by the white paper in the link.

Remus Rusanu