views:

241

answers:

3

I have a DB that is going to keep log entries.
One of the columns in the log table contains serialized (to XML) objects and a guy on my team proposed to go with XML data type rather than NVARCHAR(MAX). This table will have logs kept "forever" (archiving some very old entries may be considered in the future). I'm a little worried about the CPU overhead, but I'm even more worried that DB can grow faster (FoxyBOA from the referenced question got 70% bigger DB when using XML).

I have read this question http://stackoverflow.com/questions/514827/microsoft-sql-server-2005-2008-xml-vs-text-varchar-data-type and it gave me some ideas but I am particulairly interrested in clarification on whether the DB size increases or decreases.

Can you please share your insight/experiences in that matter.

BTW. I don't currently have any need to depend on XML features within SQL Server (there's nearly zero advantage to me in the specific case). Ocasionally log entries will be extracted, but I prefer to handle the XML using .NET (either by writing a small client or using a function defined in a .NET assembly).

A: 

What SQL Server version and edition?

Enterprise Edition has data Page compression.
SQL Server 2008 R2 has Unicode Compression.

Both of them can have significant impact on the size of the resulted data on disk when using NVARCHAR(MAX).

XML stores itself internally as serialized binary, and the efficiency of its storage is largely dependent on the XML schema used (name of elements and attributes).

Remus Rusanu
+3  A: 

If you do have XML, and you can be sure it's always XML, I would definitely recommend going that way. SQL Server stores XML in an optimized format - you don't even need any XML indices on that to benefit from it.

If you insert 5000 rows of a 5KB XML into an XML column, you get roughly 1250 pages = 9 MB. Inserting the same 5000 rows with the same 5KB XML into NVARCHAR(MAX) uses over 3700 pages or 29 MB - quite a difference!

And that difference should be even more pronounced if you can associate your XML with a XML schema stored in SQL Server. Plus you're also guaranteed that the XML stored conforms to a schema - can be very helpful at times! Can't do that with a plain NVARCHAR(MAX) column...

And I don't agree that using XML over NVARCHAR(MAX) has any performance penalty - quite the contrary. Since you're potentially retrieving less data from SQL Server when you're about to display or fetch the content, I would argue it's even a tad faster than NVARCHAR(MAX).

marc_s
But this is for log entries, so presumably you'd want to write without the indexing overhead (which could be substantial if the serialized object is large) even if it made the read much slower?
MusiGenesis
Yes, sure - as long as you only need to show the whole XML (when you need to look at a logging entry), then XML indices don't really help much - but cost in overhead, for sure!
marc_s
A: 

This benchmark shows XML data type using less IO but a little more CPU than VARCHAR(MAX). I would think NVARCHAR(MAX) would take even more IO since it's unicode.

http://searchsqlserver.techtarget.com/tip/XML-data-type-in-SQL-Server-2005-vs-VARCHAR-MAX

Revindex