tags:

views:

302

answers:

8

If it ok and recommanded to use a xml column to store any extra data that the user interface might provide?

For example, suppose an Employee table

CREATE TABLE Employee
(
    EmployeeId int not null,
    Name nvarchar(300) not null,
    Phone varchar(30) null,
    Email varchar(320) null,
    Address nvarchar(max) null,
    Data xml null
)

Data could contains many values like additional phone numbers, comments ...

We expect that all our customer will be asking for different fields in Employee, and we do not want to mess with the database structure every time they think of a new field to add.

We expect the data stored in the xml column to be infrequently accessed data. Beside being vieweable while browsing the list of employee, the data might need to be printed. So we do need to store the data type along the data (a little like a dataset is serializing its data)

Is this is a good approach for storing unknown extra data?

Edited Gave a better example

Update I haven't selected an answer yet because I'm discussing with my team of the different approach you guys suggested.

+5  A: 

You can do it like this:

[EmployeeField]
-----------------------------------------------------
EmployeeID    EmployeeFieldName    EmployeeFieldValue

It's in fact the same approach .NET Membership uses for user profiles with dynamic fields. Many business applications use the same approach to store dynamic customer-specific inputs.

Credited to Spencer Ruport's comment, this approach is known as the Entity-Attribute-Value model (EAV).

Developer Art
This is more commonly known as an Entity Attribute Value (EAV) model. Be very careful with it as there are some considerable drawbacks in performance if used to store the wrong kind of information.
Spencer Ruport
+1 for Spencer Ruport's comment. Note that it doesn't make this _wrong_ (+1 for the answer, too), just that you have to be careful with it.
Joel Coehoorn
@Spencer Ruport: Thanks for the term.
Developer Art
I use this model regularly, it's very flexible. One caveat is that you need a plan for how you will store and retrieve non-string fields like dates. That means either using another table or column just for date fields, or having some solid business logic for storing dates in text fields (which can be done, you can even sort and query them if you use a yyyy-mm-dd style format).
richardtallent
So you recommand to use a child table to store that extra information. But what if this idea works and the customers want to also be able to add fields to the Suppliers, to the Equipments, to the Parts or to any table that list something. Should we make such EAV table for each table that require one, or make a big central one that can store any information from any table? That's getting more complex that using a simple Xml column in a table where it is needed.
Pierre-Alain Vigeant
And one way or another, you still need to pivot that EAV table for each extra field that is needed.
Pierre-Alain Vigeant
A: 

Depends on what the data is. If it is not easy to model in a relational structure, then what you propose (it is really just a Serialized LOB) is OK.

RichardOD
+2  A: 

You are cutting yourself off at the knees using a relational database model for this. Relational databases are built around the idea of a static structure of data and being able to query easily and quickly. Even adopting a EAV structure as "New in town" suggests goes against this, though it's arguably better than a simple XML data dump into the column.

If customer is an oddity and the rest of your data is OK, then it's probably fine to do this, though I would definitely go with the EAV approach. If most of your tables look this way, then it's time to rethink your approach to data storage.

Adam Robinson
RDBMSs may be built around that idea, but business requirements usually cannot accept that engineering convenience.
Developer Art
I don't know: I think I'd prefer Xml to something like a Blog structure, though. After all, you _can_ query xml in sql server to some extent.
Joel Coehoorn
@New in town: That's hardly true. There are certainly cases--yours may well be one of them--where the business requirements preclude it, but to say that they "usually cannot accept that engineering convenience" is entirely untrue.
Adam Robinson
@Joel: You can query XML, and I will admit that I am more familiar with traditional normalized data structures so I can't speak from personal experience on performance, but others have expressed that XML sacrifices performance for flexibility, and EAV would seem to acommodate the requisite flexibily while maintaining SOME of the performance.
Adam Robinson
+1  A: 

I personally prefer to add extra, unknown data as a separate table. This allows you to allow an infinite number of options (just have ID, Name, Type(? - optional), and Data columns), but provides the added benefit of allowing selective updates/deletes.

If you do it as a single data blob, any time you change any single portion of the aux data, you'll have to replace the entire data field.

Reed Copsey
You are right about the blob update.
Pierre-Alain Vigeant
A: 

If you want to use the data in a query, I wouldn't recommend putting it into an xml field. Yes, there are ways to query data from xml fields, but I've not found them as effecient.

Brad Bruce
A: 

Yes, it is OK. Whether it is recommended depends heavily on your specific situation. We do something similar using Oracle and the flexibility is great but our custom web framework (Java servlet that produces web pages/apps based on modules written in XML) makes heavy use of XML and we have specific systems in place to deal with storing data in a single XML column so my opinion and experience is based on scenarios where it is convenient.

For example (and I know this might sound horrid to people), if you are going to be searching based on data inside your XML this might initially pose a problem because running XPaths and extracting data from the XML on every row every time you search is a massive performance hit. We have a system in place similar to materialised views which utilizes triggers and a stored query. Any time a row in the basetable (containing the XML column) is altered, the triggers fire, your query is run and extracts data out from the XML and inserts it into a relational table which in turn has a view over it to ensure you don't modify the relational data (since this does not reflect back into the XML).

For the purposes of something like large numbers of web based forms doing CRUD operations where the schema and thus data model may often change it is fantastic. It means you could pull out your XML fragment and instantly have the underlying model for your page and saving it is as simple as simply sticking the XML fragment back in. For fast read-only access we have instant access to a relational view of the XML.

ChrisCM
A: 

I assume you'll also have to query and search for Employee attributes in the XML.

It can be OK if you use an XML schema and create necessary XML indexes. You can also use a EAV relational model and will probably be more performant. You must read the Best Practices for Semantic Data Modeling for Performance and Scalability white-paper if you are going to implement an EAV model, this white-paper written by the SQL Customer Advisory Team deals with common pitfalls and problems that occur with badly designed EAV models.

As a side note, storing only ID and NAME for an entity as rich in attributes as an employee sounds like you're really ditching the ball short of the goal. You should at least add the common attributes you will expect your majority of customers to use and only rely on extensible models (XML or EAV) for the extra attributes that you can't forsee.

Update

Since we're at it, here are the SQL CAT white-papers on XML best practices too:

Remus Rusanu
The few table column was for the example purpose. The most common fields are present in the table.
Pierre-Alain Vigeant
A: 

Yes, provided the extra data does not “link” to any other table, and you don’t need to often search on it. If you will every have more then one user editing different parts of the extra data for the same employee, then think very carefully about your design.

So if your extra data is just a “notes” fields, and happens to have per customer predefined headings then it is ok.

There no advantage to putting this extra data in a [EmployeeField] table, as it just makes the data access code more complex without enabling you to take advantage of the power of the database.

Rather the storing the type with each data item, I think you should have a meta data table that stores the type, name, and “display name” of each bit of extra data a given customer wishes to have. You may also need to store the form layout the customer wishes to use to enter the extra data.

Ian Ringrose