tags:

views:

5219

answers:

6

What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so:

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

However I find that this is getting very slow for even moderate size xml data.

+2  A: 

I'm not sure what is the best method. I used OPENXML construction:

INSERT INTO Test
SELECT Id, Data 
FROM OPENXML (@XmlDocument, '/Root/blah',2)
WITH (Id   int         '@ID',
      Data varchar(10) '@DATA')

To speed it up, you can create XML indices. You can set index specifically for value function performance optimization. Also you can use typed xml columns, which performs better.

aku
Espo, thank you for correction. My English sucks terribly.
aku
A: 

This isn't an answer, more an addition to this question - I have just come across the same problem and I can give figures as edg asks for in the comment.

My test has xml which results in 244 records being inserted - so 244 nodes.

The code that I am rewriting takes on average 0.4 seconds to run.(10 tests run, spread from .56 secs to .344 secs) Performance is not the main reason the code is being rewritten, but the new code needs to perform as well or better. This old code loops the xml nodes, calling a sp to insert once per loop

The new code is pretty much just a single sp; pass the xml in; shred it.

Tests with the new code switched in show the new sp takes on average 3.7 seconds - almost 10 times slower.

My query is in the form posted in this question;

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

The execution plan appears to show that for each column, sql server is doing a separate "Table Valued Function [XMLReader]" returning all 244 rows, joining all back up with Nested Loops(Inner Join). So In my case where I am shredding from/ inserting into about 30 columns, this appears to happen separately 30 times.

I am going to have to dump this code, I don't think any optimisation is going to get over this method being inherently slow. I am going to try the sp_xml_preparedocument/OPENXML method and see if the performance is better for that. If anyone comes across this question from a web search (as I did) I would highly advise you to do some performance testing before using this type of shredding in SQL Server

DannykPowell
A: 

There is an XML Bulk load COM object (.NET Example)

From MSDN:

You can insert XML data into a SQL Server database by using an INSERT statement and the OPENXML function; however, the Bulk Load utility provides better performance when you need to insert large amounts of XML data.

Si
A: 

My current solution for large XML sets (> 500 nodes) is to use SQL Bulk Copy (System.Data.SqlClient.SqlBulkCopy) by using a DataSet to load the XML into memory and then pass the table to SqlBulkCopy (defining a XML schema helps).

Obviously there a pitfalls such as needlessly using a DataSet and loading the whole document into memory first. I would like to go further in the future and implement my own IDataReader to bypass the DataSet method but currently the DataSet is "good enough" for the job.

Basically I never found a solution to my original question regarding the slow performance for that type of XML shredding. It could be slow due to the typed xml queries being inherently slow or something to do with transactions and the the SQL Server log. I guess the typed xml functions were never designed for operating on non-trivial node sizes.

XML Bulk Load: I tried this and it was fast but I had trouble getting the COM dll to work under 64bit environments and I generally try to avoid COM dlls that no longer appear to be supported.

sp_xml_preparedocument/OPENXML: I never went down this road so would be interested to see how it performs.

eddiegroves
A: 

dynamic method of placing SQL generated xml back into a relational table http://www.sql-library.com/?p=94

Harry
A: 

Curious. I've shredded gigabyte XML into tables in a couple of seconds.

Josh
Care to elaborate on the method?
reallyJim