Modern RDBMS's have support for XML column types and functionality for dealing with XML in stored procedures. Historically I would always have mapped hierarchical data (whether of OO objects or XML) to relational tables. Given the widespread database support for XML should I change my ways?
views:
532answers:
11You can store user generated XML in there.
If a website like stackoverflow used some sort of XML markup instead of mark down you could store the question/answers as XML in the database. You might find yourself trying to parse this user generated XML looking for proprietary tags.
Let's say you have an entity that has attributes. You can store all those attributes in XML instead of creating a separate attribute table. The XML would be more flexible.
If you don't see the need then don't change!
Sometimes you have to persist data that does not have a known structure, or its structure is very volatile. In those cases, instead of creating a table, just save the XML into your existing table
Flexibility is one reason.
If the structure of your data can vary, then you can still keep a common RDBMS table, along with the queries, etc. that gor after it with the somewhat variably structured data.
If you need to add a field at some point, you can do so without changing your RDMS table structure, and thus not break everyone else's queries.
You can process XML data directly in SQL server. E.g. you can apply XPath expressions and just send the filtered result set to the client. SQL server features can build upon XML processing capabilities later.
The features above exist from MS SQL Server 2000 or 2005.
For example, you get XML documents from some other system, with very rich or complex structure that you want to store; but you only need a few well-defined queries to retrieve that data. In that case, just parse the data you need to generate some indexes, and store the whole XML structure in a single field.
To do that you don't need much XML-specific support on the DB engine, but it still helps to keep queries expressive.
Besides that, I'd guess some DMBS with good XML support could let you simply store the XML document, maybe without specifying how to index it. You just use XQuery and hope it somehow optimises to your needs.
I have a good real-life example. One of my clients receives an XML file from their suppliers very often with some important data. It is deeply nested. They need to compare it with the previous XML file to see what has changed. Without XML support in database, I had to build a tool that iterates through XML nodes and looks for matches in the tables of relational database. I could use some XML-XML comparison tool, but some of the checks relate to some other data that did not come from XML file and I need to join all that together. Ok, all of this is not that big deal, but still - with XML databases you get that functionality out-of-the-box.
Here is a real world example from a system I work on. We have a core system and create customer-specific code in java. A different class may be called depending on which customer is transacting. Sometimes this custom code needs to store something and we put it into an XML column in the relevant table. This saves us from modeling everything under the sun. Adding a new customer generally just means writing and installing the java code.
The downside is that reporting, querying and updates are more difficult on the XML column. There are none of the usual good database features like check constraints, etc.
I've not had the need to store XML so far, but I frequently use the ability to return XML from a stored procedure. It makes some things very useful - mainly reports. I can run a SP to generate the report, send back the results in XML, and then use XSLT to display the result on the site very easily.
I use the XML column type to stash a copy of all business-critical messages we receive from a third-party service. It's very handy for a few reasons.
1) In case of data corruption, we can back-trace to see what data came in when, and in what format.
2) Future development work on systems can be based on real data from the log table - just deserialise and use the data as if it came from a call to the 3p service
3) Makes sure the infrasructure guys are busy allocating disk space to the DB server. ;)
The only reason I would ever use it again is when for extensibility & flexibility.
The overhead of xml (xpath) and maintenance (namespaces) are really not worth the hassle if you can avoid it. We have previously stored large amounts of data in xml and used scalar functions to retrieve it, but it is too slow and causes immense headaches is the xml structure or namespace changes.
But the flexibility is fantastic. You can add new properties whenever you want, you can have project/client/job specific data in there which doesn't require proper columns. The XML does not have to be in a static structure - you simply need a factory which can spawn instances to deal with the different XML (which would need to be related to a project/client/job).
When adding a new table to an existing system, especially one that has a lot of existing data and can't easily be modified, I will add an XML column. In the future if I ever need to add another column to that table, I can simply utilise the XML column rather than being frustrated and having to do a lot of rework.
In summary, you don't start out by putting essential properties in XML. But you should add XML when you know that your table might need to be extended, precisely because it gives you the option of extending.