views:

179

answers:

3

(This post is soliciting personal experiences about storing XML; please share what you know. :-) )

I am working on a service application that communicates with an external service using XML. I am planning to use SQL Server 2008 to store the XML that is received and sent to the external service. I'm exploring my options for storing the XML in the database. The three options that I have identified are:

  1. Store the XML in an XML data type column
  2. Create tables to store the various parent and child relationships represented in the XML.
  3. A hybrid of the two above approaches where the original XML is stored in an XML data type column but several fields from the XML broken out into their own columns to simplify querying and indexing.

I'm looking for any advice, based on your personal experience, with storing and retrieving XML data in SQL Server.

Some additional background: I've used an 'xsd.exe' equivalent called XsdObjectgenerator to create .net classes based on the XML schemas. When the service receives the XML file, it is deserialized into an instance of .net class. This instance is used to perform the operations of the service. My original plan was to then use option #1 above to store the XML. If I needed to update or report on the data, I would simply deserialize the db record back into one of my .net classes.

Although this approach works and makes working with the xml very simple, I have concerns that as the volume of data increases, the performance of querying XML data type records will decrease. This is why I've explored options 2. & 3. above.

In addition to storing the XML, the XML will be queried for use in both reports and a separate web application. The db records will be queried, sorted, filtered, grouped, summaried and possibly updated by the end users.

+5  A: 

I guess it depends on what you want to do with your XML in your database.

If you're mostly only storing it, and possibly retrieving it later as a whole and sending it out again, then I'd definitely use the XML datatype - no point in shredding it into bits and pieces.

If you however need to mostly work with the contents of the XML file, and possibly also manipulate and change that content, then it might be advisable to create tables with columns to match your XML content, and shred it when storing it, use it, and when you need to, reassemble it from the relational pieces using something like SELECT (columns) FROM dbo.Table FOR XML.....

There is an overhead involved in shredding and reassembling - so you need to ask yourself if that's worth doing. But there's also an overhead involved if you need to manipulate the XML column too much.

If you only need read-only access to a few attributes in your XML, I've come to appreciate the ability to wrap those into a UDF and surface it as a computed column in your table. That way, you can easily select something from your table, based on values that are stored somewhere inside your XML - quite handy! But do not overuse this approach - works fine for 2, 3 attributes - but if you need to access your XML over and over again (and most or all of it), then you might be better off shredding it into relational pieces to begin with.

marc_s
Thanks for your helpful response.
CletusLoomis
A: 

While continuing to explore solutions, a collegue forwarded the following applicable links:

Some preliminary conclusions from these articles and other research:

  • While working with the xml datatype in SQL Server is flexible, querying large volumes of data will be slow as you are essentially querying a blob datatype.
  • While you can create indexes on xml datatype columns in Sql Server, the index is on the whole column and not on a particular element or attribute, so the indexes are not as effective as an index on a non-xml db column.
  • Storing the xml in raw form in an xml datatype field while also maintaining a parsed version of the data in either relational tables or denormalized flat table(s) for querying and reporting is beginning to emerge as the most flexible solution. The xml can be "shredded" into the querying tables either at runtime or after the fact by a separate service or thread.

I will be mocking up each solution with test data and performing some benchmarking. I'll post the results here once they are available.

CletusLoomis
+1  A: 

A few jobs back (SQL 2000), we were storing XML as TEXT data, and our databases became significantly bloated--not so much with the data as with the tags used to identify it. I did some testing, and pkzip (I did say it was several jobs ago) crunched all the data down to 3% of it's original size.

Advice #1: Identify how long you need to store the data, and if/when possible archive old data out.

Advice #2: If you are using SQL 2008, look into the data compression options for the XML columns.

(Might not be relevant if your XMLs are short, but ours were all in the kbs and 10kbs.)

Philip Kelley
Thanks for your response.
CletusLoomis