views:

89

answers:

3

In relation to a discussion started at this question, I've decided to put this up as a community wiki question.

The root of the question is, therefore, is it appropriate to store XML data in a relational database? Are there generally better ways to implement the same goal? What database engines provide good support for XML data types (such as SQL Server), and what are the issues surrounding so-called "XML indexes"?

+3  A: 

Databases are for storing data. XML is data. Therefore, under the right circumstances it's perfectly valid to store XML in a database. Whether that's the most efficient thing to do depends on a lot of factors that probably can't be generalized.

For example, if you have a structured XML document that represents an object (eg: a book in a bookstore), it likely makes sense to parse the data and store in in appropriate rows and columns in a database designed for that data.

OTOH, imagine a database holding code samples. You have columns for language, description, and the code. In the case of XML, obviously you'll store the XML in the code column.

So, like so many things in software, "it depends".

Bryan Oakley
It used to be that XML in a RDBMS was opaque and unqueryable. This has been less and less the case over the last decade.
Steven Sudit
@Steven Sudit: excuse my ignorance, but how does an RDBMS even know it's XML? It's just a bunch of bytes. Why would plain ascii text be searchable but plain ascii text that happens to look like XML not be searchable?
Bryan Oakley
@Bryan: The column is explicitly declared as type `xml`, as opposed to just `nvarchar(max)`, and can even have a schema associated with it. (This is for MS SQL Server 2008, but other databases have similar functionality. For an overview for this product, see http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new-xml.aspx)
Steven Sudit
To be more clear, while SQL could always search text as text, newer servers can search XML as XML, using XPath and related technologies. Hope that helps.
Steven Sudit
@Steven Sudit: ah, I see. I wasn't talking about being able to search in the XML except as you would any raw text field.
Bryan Oakley
@Bryan: The nice part is that you can write a stored procedure that takes an XML input containing both known elements and novel ones, then semi-automatically shred the XML so that the known elements are used to fill pre-defined fields while the remnant XML goes into a catch-all field. On the way back, the two are combined seamlessly. The net effect is substantial flexibility with regard to the table definition. It's good stuff.
Steven Sudit
Here's a SO link for that: http://stackoverflow.com/questions/61233/the-best-way-to-shred-xml-data-into-sql-server-database-columns
Steven Sudit
A: 

it actually depends on what kind of data you are storing in your DB, its normal for exemple to store XHTML information on the database.. one of the principes of a relational database is that the information is atomic, i mean, you should store something like

name    |   professions
----------------------
Clark   |  writer, journaliste, superhero

so i'd be against to store something like

name    |   information
----------------------
Clark   |  <profession> writer </profession><profession> journaliste </profession><profession> superhero </profession>
pleasedontbelong
+1  A: 

One problem with supporting XML in an RDBMS is that there are AFAIK no universally accepted rules for comparison of XML documents. A relational database can in principle store any attribute value that can support assignment and comparison - this being essential to relational projection for example because the values of each attribute must be comparable to each other. Comparison is no problem for most types: strings, numbers, binary etc. It's potentially more difficult for a XML document type.

Some SQL DBMSs (ie non-relational DBMSs) simply don't permit comparison of XML values at all. For instance Microsoft SQL Server allows XML-typed columns but they cannot be compared and so among other things SELECT DISTINCT is not supported.

dportas