views:

690

answers:

3

I'm serializing an object that contains HTML data in a String Property.

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Create)
Formatter.Serialize(fs, Ob)
fs.Close()

But when I'm reading the XML back to the Object:

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Open)
Dim Ob = CType(Formatter.Deserialize(fs), MyObject)
fs.Close()

I get this error:

"'', hexadecimal value 0x14, is an invalid character. Line 395, position 22."

Shouldn't .NET prevent this kind of error, escaping the invalid characters?

What's happening here and how can I fix it?

A: 

It should really have failed in the serialize step, because 0x14 is an invalid value for XML. There is no way to escape it, not even with &#x14, since it is excluded as a valid character from the XML model. I am actually surprised that the serializer lets this through, as it makes the serializer a non-conforming one.

Is it possible for you to remove the invalid characters from the string before serializing it? For what purpose do you have an 0x14 in HTML?

Or, is it possible you are writing with one encoding, and reading with a different one?

lavinio
Well, I've gone with this solution. I've removed the invalid chars from the String before Serializing. But, I still don't understand why doesn't XmlSerializer Deserialize an object that has Serialized.
DK39
You're in good shape, unless the invalid characters were actually important.
John Saunders
A: 

I would exepct .NET to handle this, but you can also have look at XmlSerializer class and XmlReaderSettings (see sample generic method below):

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings()
                                {
                                    ConformanceLevel = ConformanceLevel.Fragment,
                                    ValidationType = ValidationType.None
                                };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T), "");

    return (T)xs.Deserialize(xmlReader);
}

I would also check if there are no encoding (Unicode, UTF8, etc.) issues in your code. Hexadecimal value 0x14 is not a char you would expect in XML :)

Piotr Owsiak
-1 for: no using blocks, using XmlTextReader, suggesting a solution without knowing the problem.
John Saunders
What's the issue not using 'using' blocks?
Ian
Resource leaks. Both XmlReader and StringReader implement IDisposable.
John Saunders
Ah, ok. Fair point :)
Ian
You're righ John, thanks.However seems you did not know the problem either and yet you tried to force your solution on DK39.And btw. voting my answer down to get yours higher seems soooo lame :-P
Piotr Owsiak
A: 

You should really post the code of the class you're trying to serialize and deserialize. In the meantime, I'll make a guess.

Most likely, the invalid character is in a field or property of type string. You will need to serialize that as an array of bytes, assuming you can't avoid having that character present at all:

[XmlRoot("root")]
public class HasBase64Content
{
    internal HasBase64Content()
    {
    }

    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

This produces XML like the following:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
    <Base64Content>AAECAwQFFA==</Base64Content>
</root>


I see you'd probably prefer VB.NET:

<XmlRoot("root")> _
Public Class HasBase64Content

    Private _content As String
    <XmlIgnore()> _
    Public Property Content() As String
        Get
            Return _content
        End Get
        Set(ByVal value As String)
            _content = value
        End Set
    End Property

    <XmlElement()> _
    Public Property Base64Content() As Byte()
        Get
            Return System.Text.Encoding.UTF8.GetBytes(Content)
        End Get
        Set(ByVal value As Byte())
            If Value Is Nothing Then
                Content = Nothing
                Return
            End If
            Content = System.Text.Encoding.UTF8.GetString(Value)
        End Set
    End Property
End Class
John Saunders
Hi John. The problem here is not Serializing an Object with invalid characters. The problem is why Xml.Serialization.XmlSerializer doesn't escape the invalid characters when Serializing.
DK39
Depending on what he's serializing, it's probably not supposed to escape it. He needs to show what he is serializing.
John Saunders
BTW, DK39, check my profile. I'm a bit of an expert in this area. It's not about escaping.
John Saunders
OK, but I still don't understand why XmlSerializer doesn't Deserialize an object with a String that himself has Serialized.
DK39
It might very well be a bug. Maybe it should have failed to serialize it. It doesn't matter - it won't be fixed. The question is what's the right way to always be able to get arbitrary strings that don't fit the XML definition of a string serialized and deserialized. The answer is above.
John Saunders