views:

65

answers:

2

Hi All,

What are the best practices for working with Sql server Xml columns to ensure quick performance and also ease of reporting?

How do you set up the column? do you leave it as untyped? or associate it with a schema?

Does associating the xml column with a schema improve query performance?

Our use of xml columns is as follows:

A.> On a PER customer basis we can define flexible storage of their data without overhauling our db.

B.> We need to build reporting views for each customer which returns their data as if it was a simple table (for crystal reports or Sql Server Reporting Services).

The syntax we currently use to query is as follows:

SELECT 
Id, 
doc.value('@associatedId','nvarchar(40)') as AssocId,
doc.value('@name1', 'nvarchar(255)') as Name1,
doc.value('@name2', 'nvarchar(255)') as Name2,
doc.value('@name3', 'nvarchar(255)') as Name3,
doc.value('@number', 'nvarchar(255)') as Number
From OrderDetails
CROSS APPLY OrderDetails.XmlData.nodes('//root/reviewers/reviewer') as XmlTable(doc)

Is there a quicker way to do this? this query runs slowly for us in a table with 1million records, but only 800 currently have xml data!

Thanks

Pete

+3  A: 

From XML Best Practices for Microsoft SQL Server 2005:

Use a typed or untyped XML?

Use untyped XML data type under the following conditions:

  • You do not have a schema for your XML data.
  • You have schemas but you do not want the server to validate the data.

This is sometimes the case when an application performs client-side validation before storing the data at the server, or temporarily stores XML data invalid according to the schema, or uses XML schema features not supported at the server (for example, key/keyref).

Use typed XML data type under the following conditions:

  • You have schemas for your XML data and you want the server to validate your XML data according on the XML schemas.
  • You want to take advantage of storage and query optimizations based on type information.
  • You want to take better advantage of type information during compilation of your queries such as static type errors.

Typed XML columns, parameters and variables can store XML documents or content, which you have to specify as a flag (DOCUMENT or CONTENT, respectively) at the time of declaration. Furthermore, you have to provide one or more XML schemas. Specify DOCUMENT if each XML instance has exactly one top-level element; otherwise, use CONTENT. The query compiler uses DOCUMENT flag in type checks during query compilation to infer singleton top-level elements.

Does associating the xml column with a schema improve query performance? See above point: use typed XML if you want to take advantage of query optimizations based on type information.

There is also a lengthy discussion over the benefits of XML indexes:

Your application may benefit from an XML index under the following conditions:

  • Queries on XML columns are common in your workload. XML index maintenance cost during data modification must be taken into account.
  • Your XML values are relatively large and the retrieved parts are relatively small. Building the index avoids parsing the whole data at runtime and benefits index lookups for efficient query processing.

And most importantly, the appropriate type of secondary XML index for your usage:

  • If your workload uses path expressions heavily on XML columns, the PATH secondary XML index is likely to speed up your workload. The most common case is the use of exist() method on XML columns in WHERE clause of Transact-SQL.
  • If your workload retrieves multiple values from individual XML instances using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its relational primary key value is known.
  • If your workload involves querying for values within XML instances without knowing the element or attribute names that contain those values, you may want to create the VALUE index. This typically occurs with descendant axes lookups, such as //author[last-name="Howard"], where <author> elements can occur at any level of the hierarchy and the search value ("Howard") is more selective than the path. It also occurs in "wildcard" queries, such as /book [@* = "novel"], where the query looks for <book> elements with some attribute having the value "novel".
Remus Rusanu
wow a really thorough answer on this one remus thank you. Thats definately given me food for thought and lots of keywords and search terms to further my research.
Peter
+2  A: 

If as in the above example you are using the XML to store various string columns, I don't think you would really benefit from typed XML unless you have a need for the server to validate the data. Performance-wise, I suspect it would be faster untyped.

For these kinds of queries you absolutely need to have XML indexes in place, they are essential for good performance of XML queries. Without indexes, XML columns are stored as blobs so in order to query them, SQL needs to shred the blob into XML first, then do whatever operations you are requesting. A primary XML index stores the shredded XML in the database so it doesn't need to be done on the fly. You need to create a primary XML index first, then secondary XML indexes can be created to support your queries.

There are 3 types of secondary XML indexes: PATH, VALUE and PROPERTY. Which secondary indexes you need depends on the type of queries you're going to be doing, so I would encourage you to review the Secondary XML Indexes topic in Books Online to decide which one(s) would be useful to you: http://msdn.microsoft.com/en-us/library/bb522562(SQL.100).aspx

bluefooted
Thanks again for the info blue and for the link +1.
Peter