tags:

views:

1100

answers:

6

I have an XML stored procedure in MS SQL 2005 which I use the SqlCommand.ExecuteXmlReader to get an XmlReader, then parse through the data and form an XML document. The problem is that the data in SQL contains some binary characters which are illegal within a UTF-8 XML document, so an exception is thrown.

Has anyone else dealt with this problem? I've considered filtering the data on input into the DB, but then I'd have to put the filtering everywhere, and every character would need to be checked.

Any other suggestions?

EDIT: The data is typically stored in varchar columns of various length. The data is actually input from users on web forms (ASP .NET app). So sometimes they copy-paste from MS Word or something and it puts these strange binary characters in.

A: 

How did the bad data get into the database? Are you using an XML column?

You can put the filtering (it's called "validation", actually) into the stored procedures used to enter data into the database, or you can add triggers to check the data regardless of where it comes from.

In general, don't allow bad data to get into the database!

John Saunders
The data is user-input stored in varchar columns in the database.
Brandon Montgomery
A: 

Is this a matter of encoding? Or is the xml just malformed? If malformed, I can't help. But for encoding... it is unfortunate that ExecuteXmlReader doesn't let you specify the encoding, but you could treat the data as a BLOB, and process it separately with your own encoding and XmlReader?

If the data is large, you'd probably want to use ExecuteReader with CommandBehavior.SequentialAccess and write it to a temporary file (Path.GetTempFileName()) - then process that file as a Stream with XmlReader.

Marc Gravell
A: 

I've abstracted the making of SqlParameter objects everywhere in the application already, so I'll scrub the input at that point. My abstraction method creates and returns a SqlParameter object to use in a stored procedure call. If it's a varchar the caller wants, I'll loop through each character of the string they want to make into a SqlParameter object and filter out those illegal binary XML characters. That will eliminate the bad data from entering the database in the first place.

Brandon Montgomery
A: 

How is your stored procedure produce the XML? If you use any of the FOR XML options in SQL Server, binary characters in text fields will be properly escaped:

CREATE TABLE test (
   id int identity(1,1) not null primary key, 
   data nvarchar(50))
INSERT INTO test (data) values (char(0))
SELECT * FROM test FOR XML RAW

produces:

<row ID="1" data="&#x0;" />
Robert Rossney
I am using "For Xml Explicit"
Brandon Montgomery
That shouldn't matter; FOR XML EXPLICIT properly escapes binary XML characters too.
Robert Rossney
+1  A: 

I've have seen the DotNet SqlClient "scramble" data from nvarchar columns in the database, our theory that was its something to do with "surrogate code points", see:

http://blogs.msdn.com/michkap/archive/2005/07/27/444101.aspx

http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=rzaaxsurrogate.htm

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0004816.htm

SqlClient seemed to "interpret" some of the bytes meaing that our Xml was no longer well formed, converting to nvarchar(max) seemed to stop this (although this did have a performance impact):

SELECT CONVERT(NVARCHAR(MAX), MyValue) FROM ...

Note that you need to use NVARCHAR(MAX), NVARCHAR( N ) doesnt work.

We also found that the OleDB provider works correctly as well (although it is slower than the SqlClient).

Kragen
A: 

We have the same problem. People paste weird, invisible characters from Word into a form. We ended up replacing every single problematic character when inserting into the DB. Quite cumbersome.

To elaborate on the below example, the text fields do get escaped but the result is not compatible when converting the result to xml, as shown here:

SELECT * FROM test FOR XML RAW

yields

<row id="1" data="&#x0;" />

but

declare @variable xml;
select @variable = (SELECT * FROM test FOR XML RAW)

gives

Msg 9420, Level 16, State 1, Line 2
XML parsing: line 1, character 24, illegal xml character

An error in sql server 2008?

Martin Ørding-Thomsen