tags:

views:

690

answers:

2

I have the following XML that I'm running in SQL Server, and it breaks, why?

 declare @xml varchar(max)
 declare @hDoc int

 set @xml = '<transaction>
 <item itemId="1" value="Hello World" />
 <item itemId="2" value="Hello &World" />
 <item itemId="3" value="Hello <World" />
 <item itemId="4" value="Hello >World" />
 <item itemId="5" value="Hello ’World" />
 </transaction>'

 exec sp_xml_preparedocument @hDoc OUTPUT, @xml

 select
     itemId
 ,   value
 from
     openxml(@hDoc, '/transaction/item')
    with (
      itemId int,
      value varchar(max)
     ) item
+1  A: 

The values in the XML contain invalid characters. For XML in general you must escape the less than sign and the ampersand like so: &lt; and &amp;

 declare @xml varchar(max)
 declare @hDoc int

 set @xml = '<transaction>
 <item itemId="1" value="Hello World" />
 <item itemId="2" value="Hello &amp;World" />
 <item itemId="3" value="Hello &lt;World" />
 <item itemId="4" value="Hello >World" />
 </transaction>'

 exec sp_xml_preparedocument @hDoc OUTPUT, @xml

 select
     itemId
 ,   value
 from
     openxml(@hDoc, '/transaction/item')
    with (
      itemId int,
      value varchar(max)
     ) item

However when using openxml certain values won't work in general, specifically that curly apostrophe. I'm not sure what values are invalid, but I know that is one of them. So the solution is to use the native XML type in SQL Server 2005.

 declare @xml xml
 set @xml = '<transaction>
 <item itemId="1" value="Hello World" />
 <item itemId="2" value="Hello &amp;World" />
 <item itemId="3" value="Hello &lt;World" />
 <item itemId="4" value="Hello >World" />
 <item itemId="5" value="Hello ’World" />
 </transaction>'

 select
       item.value('@itemId', 'int')
     , item.value('@value', 'varchar(max)')
 from @xml.nodes('/transaction/item') [transaction](item)
Tom Ritter
Close enough, but it's a little more complicated, too. Xml spec defines a very large swath of unicode as 'allowed', and anything outside of that is not allowed and must be escaped. Since unicode is very large, there are still a LOT of other characters not mentioned here that must be escaped.
Joel Coehoorn
You mean characters that won't work with openxml like the curly apostrophe, or you mean characters like ampersand and less than that must be encoded? I thought only those two and "->" had to encoded for XML...
Tom Ritter
A: 

If possible, assemble your XML document in the application. The framework (.Net, Java, ...?) will do the proper oncoding for you, and will yield valid XML.

cdonner