views:

594

answers:

4

I have an object that I am serializing to xml. It appears that a value in one of the properties contains the hex character 0x1E. I've tried setting The Encoding property of XmlWriterSettings to both "utf-16" and "unicode" but I still get an exception thrown:

here was an error generating the XML document. ---> System.InvalidOperationException: There was an error generating the XML document. ---> System.ArgumentException: '', hexadecimal value 0x1E, is an invalid character.

Is there any way to get these characters into the xml? If not, are there other characters that will cause problems?

+1  A: 
Sonny Boy
+2  A: 

The XML Recommendation (aka spec) http://www.w3.org/TR/2000/REC-xml-20001006 outlines which characters are not allowed and must be escaped


2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters, which may represent markup or character data.] [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also [ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors must accept any character in the range specified for Char. The use of "compatibility characters", as defined in section 6.8 of [Unicode] (see also D21 in section 3.6 of [Unicode3]), is discouraged.]

Character Range

[2]     Char     ::=     #x9 | #xA | #xD | [#x20-#xD7FF] |
            [#xE000-#xFFFD] | [#x10000-#x10FFFF]    
     /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit patterns may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646; the mechanisms for signaling which of the two is in use, or for bringing other encodings into play, are discussed later, in 4.3.3 Character Encoding in Entities.


peter.murray.rust
+1  A: 

XML is a human-readable format and non-printable control characters are forbidden. You can use decimal character entity codes like  to represent them, or base-64 encode the content.

Dour High Arch
A: 

Since you didn't give any details, I'm going to guess that your property is of type System.String. If so, then you cannot serialize it as-is. Instead, you must serialize it as a byte[]:

[XmlRoot("root")]
public class HasBase64Content
{
    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement("Content")]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}
John Saunders