views:

1008

answers:

6

We develop a commercial application. Our customers are asking for custom fields support. For instance, they want to add a field to the Customer form.

What are the known design patterns to store the field values and the meta-data about the fields?

I see these options for now:

Option 1: Add Field1, Field2, Field3, Field4 columns of type varchar to my Customer table.

Option 2: Add a single column of type XML in the customer table and store the custom fields' values in xml.

Option 3: Add a CustomerCustomFieldValue table with a column of type varchar and store values in that column. That table would also have a CustomerID, a CustomFieldID.

CustomerID,  CustomFieldID, Value
10001,       1001,          '02/12/2009 8:00 AM'
10001,       1002,          '18.26'
10002,       1001,          '01/12/2009 8:00 AM'
10002,       1002,          '50.26'

CustomFieldID would be an ID from another table called CustomField with these columns: CustomFieldID, FieldName, FieldValueTypeID.

Option 4: Add a CustomerCustomFieldValue table with a column of each possible value type and store values in the right column. Similar to #3 but field values are stored using a strongly-type column.

CustomerID,  CustomFieldID, DateValue,           StringValue,       NumericValue                 
10001,       1001,          02/12/2009 8:00 AM,  null,              null
10001,       1002,          null,                null,              18.26
10002,       1001,          01/12/2009 8:00 AM,  null,              null
10002,       1002,          null,                null,              50.26

Option 5: Options 3 and 4 use a table specific to a single concept (Customer). Our clients are asking for custom field in other forms as well. Should we instead have a system-wide custom field storage system? So instead of having multiple tables such as CustomerCustomFieldValue, EmployeeCustomFieldValue, InvoiceCustomFieldValue, we would have a single table named CustomFieldValue? Although it seems more elegant to me, wouldn't that cause a performance bottleneck?

Have you used any of those approaches? Were you successful? What approach would you select? Do you know any other approach that I should consider?

Also, my clients want the custom field to be able to refer to data in other tables. For instance a client might want to add a "Favorite Payment Method" field to the Customer. Payment methods are defined elsewhere in the system. That brings the subject of "foreign keys" in the picture. Should I try to create constraints to ensure that values stored in the custom field tables are valid values?

Thanks

======================

EDIT 07-27-2009:

Thank you for your answers. It seems like the list of approaches is now quite comprehensive. I have selected the option 2 (a single XML column). It was the easiest to implement for now. I will probably have to refractor to a more strongly-defined approach as my requirements will get more complex and as the number of custom fields to support will get larger.

+3  A: 

As far as the application code is concerned I'm unsure. I do know that custom fields benefit greatly from a EAV model in the database.

Per the comments below, the most significant mistake you can make with this model is putting foreign keys into it. Never ever put something like FriendID or TypeID into this model. Use this model in conjunction with the typical relational model and keep foreign key fields in table columns as they should.

A second significant mistake is placing data in this model that needs to be reported with every element. For example putting something like Username in this model would mean that anytime you want to access a user and need to know their username you've committed yourself to a join at best or 2n queries where n is the number of users you're looking at. When you consider that you are usually going to need the Username property for every User element it becomes obvious this too should remain in the table columns.

However, if you're just using this model with custom user fields you'll be fine. I can't imagine many situations where a user would be entering in relational data and the EAV model is not too significantly detrimental to searches.

Lastly, don't try to join data from this and get a nice pretty recordset. Grab the original record and then grab the set of records for the entity. If you find yourself tempted to join the tables you've probably made the second mistake as mentioned above.

Spencer Ruport
I'm using this model in a current project of mine and it has been really beneficial. You also generate de-normalized views of the data when necessary for simple querying/data binding.
Dillie-O
Yup, when carefully applied these are pretty powerful.
Spencer Ruport
Pay close attention to the warnings in the Wikipedia article. Done poorly, entiry/value pairs can murder a system, as is attested by many disaster stories scattered about the web. If user-defined data has to relate to more items than just the "direct parent", you are probably much better off adding that data to the model as new rows or tables. My main piece of advice would be to try very hard not to overused such a system.
Philip Kelley
Interesting read on Wikipedia. EAV model is certainly a double edged sword. My +1 is putting a name to this approach and the fact that the wikipedia article is quite good. Personally I can see lots of pitfuls with following this approach, but can see where it would be useful. I think having an XML column is a serious alternative though, despite what the wikipedia article states.
RichardOD
+1  A: 

If you're developing with an object oriented language, we're talking about adaptive object models here. There are quite a few articles written about how you can implement them in oo-languages, but not so much information about how to design the data store side.

In the company where I work, we have solved the problem by using a relational database to store AOM data. We have central entity table for presenting all the different "entities" in the domain, like people, network devices, companies, etc... We store the actual "form fields" to data tables that are typed, so we have one table for strings, one for dates and so on. All the data tables have a foreign key pointing to the entity table. We also need tables to present the type-side, i.e. what kind of attributes (form fields) can certain entity have and this information is used to interpret the data in data tables.

Pros of our solution are that anything can be modeled without code changes, including references between entities, multivalues and so on. It's also possible to add business rules and validations to fields and they can be reused in all form. Cons are that the programming model is not very easy to understand and query performance will be worse than with a more typical DB design. Some other solution than relational database could have been better and easier for AOM.

Building a good AOM with a working data store for it is a lot of work and I wouldn't recommend it if you don't have highly skilled developers. Maybe one day there will be an OS solution for these kinds of requirements.

Custom fields have been discussed before in SO:

Kaitsu
+1  A: 

Something like Option 3 is the way to go and i have used this method previously. Create a single table to define additional properties and their corresponding values. This would be a 1-N relationship between your Customer and CustomerCustomField table (respectively). Your second question regarding defining relationships with custom properties would be something to think about. The first thing that comes to mind is adding a DataSource field, which would contain the table to which the property value is bound to. So essentially your CustomerCustomField would look like:

  1. CustomerId
  2. Property
  3. Value
  4. ValueDataSource (nullable)

This should allow you to either bind to a specific data structure or simply allow you to specify unbound values. You can further normalize this model, but something like this could work and should be easy enough to handle in code.

Sergey
A: 

if those 'extra' fields are incidental and don't care to do searches on them, I usually go for option 2 (but like JSON better than XML). If there's going to be searches on custom fields, option 3 isn't hard to do, and usually the SQL optimizer can get reasonable performance out of it.

Javier
+1  A: 

Option 4 or 5 would be my choice. If your data is important, I wouldn't go tossing away your type information with Option 3. (You might try to implement full type-checking yourself, but it's a pretty big job, and the database engine already does it for you.)

Some thoughts:

  • Make sure your CustomFields has a DataType column.
    • Use a UDF-based check constraint on CustomFieldValues to ensure that the column specified by CustomFields.DataType is non-null.
    • You'll also want a standard check constraint to make sure you have exactly one non-null value.
  • Regarding foreign keys, I would model these as a separate DataType.
    • Each potential cross-table reference would require its own column. This is good, because it maintains referential integrity.
    • You would have to support these relationships in application code anyway, so the fact that they are hard-coded in the database does not actually limit functionality.
    • This will also jive well with your ORM, if you're using one.
  • For Option 5, use intermediary tables to model the relationships.
    • You would still have a CustomerCustomFieldValue, but instead with only CustomerID and CustomFieldValueID columns.
  • Think long and hard about your constraints every step of the way. This is tricky stuff, and one misstep can cause utter havok down the line.

I am using this in an application currently in development. There haven't been any problems yet, but EAV designs still scare the daylights out of me. Just be careful.

As an aside, XML may also be a good choice. I don't know as much about it from direct experience, but it was one of the options I considered when starting the data design, and it looked pretty promising.

WCWedin
+2  A: 

I agree with posters below that Options 3, 4, or 5 are most likely to be appropriate. However, each of your suggested implementations has its benefits and costs. I'd suggest choosing one by matching it to your specific requirements. For example:

  1. Option 1 pros: Fast to implement. Allows DB actions on custom fields (searching, sorting.)
    Option 1 cons: Custom fields are generic, so no strongly-typed fields. Database table is inefficient, size-wise with many extraneous fields that will never be used. Number of custom fields allowed needs to be anticipated.
  2. Option 2 pros: Fast to implement. Flexible, allowing arbitrary number and type of custom fields.
    Option 2 cons: No DB actions possible on custom fields. This is best if all you need to do is display the custom fields, later, or do minor manipulations of the data only on a per-Customer basis.
  3. Option 3. pros: Both flexible and efficient. DB actions can be performed, but the data is normalized somewhat to reduce wasted space. I agree with unknown (google)'s suggestion that you add an additional column that can be used to specify type or source information. Option 3 cons: Slight increase in development time and complexity of your queries, but there really aren't too many cons, here.
  4. Option 4 is the same as Option 3, except that your typed data can be operated on at the DB level. The addition of type information to the link table in Option 3 allows you to do more operations at our application level, but the DB won't be able to do comparisons or sort, for example. The choice between 3 and 4 depends on this requirement.
  5. Option 5 is the same as 3 o4 4, but with even more flexibility to apply the solution to many different tables. The cost in this case will be that the size of this table will grow much larger. If you are doing many expensive join operations to get to your custom fields, this solution may not scale well.

P.S. As noted below, the term "design pattern" usually refers to object-oriented programming. You're looking for a solution to a database design problem, which means that most advice regarding design patterns won't be applicable.

Eric Nguyen
I accept this answer because I think it helps choosing a solution.
Sly