views:

4871

answers:

2

Problem

By leveraging some samples I found online here, I've written some XML serialization methods.

  • Method1: Serialize an Object and return: (a) the type, (b) the xml string
  • Method2: Takes (a) and (b) above and gives you back the Object.

I noticed that the xml string from the Method1 contains a leading '?'. This seems to be fine when using Method2 to reconstruct the Object.

But when doing some testing in the application, sometimes we got leading '???' instead. This caused the Method2 to throw an exception while trying to reconstruct the Object. The 'Object' in this case was just a simple int.

System.InvalidOperationException was unhandled Message="There is an error in XML document (1, 1)." Source="System.Xml" StackTrace: at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle) at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream) at XMLSerialization.Program.DeserializeXmlStringToObject(String xmlString, String objectType) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 96 at XMLSerialization.Program.Main(String[] args) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 49

Would anyone be able to shed some light on what might be causing this?

Sample Code

Here's sample code from the mini-tester I wrote while coding this up which runs as a VS console app. It'll show you the XML string. You can also uncomment the regions to append the extra leading '??' to reproduce the exception.



using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace XMLSerialization
{
    class Program
    {
        static void Main(string[] args)
        {
            // deserialize to string
            #region int
            object inObj = 5;
            #endregion

            #region string
            //object inObj = "Testing123";
            #endregion

            #region list
            //List inObj = new List();
            //inObj.Add("0:25");
            //inObj.Add("1:26");
            #endregion

            string[] stringArray = SerializeObjectToXmlString(inObj);

            #region include leading ???
            //int indexOfBracket = stringArray[0].IndexOf('<');
            //stringArray[0] = "??" + stringArray[0];
            #endregion

            #region strip out leading ???
            //int indexOfBracket = stringArray[0].IndexOf('<');
            //string trimmedString = stringArray[0].Substring(indexOfBracket);
            //stringArray[0] = trimmedString;
            #endregion

            Console.WriteLine("Input");
            Console.WriteLine("-----");
            Console.WriteLine("Object Type: " + stringArray[1]);
            Console.WriteLine();
            Console.WriteLine("XML String: " + Environment.NewLine + stringArray[0]);
            Console.WriteLine(String.Empty);

             // serialize back to object
            object outObj = DeserializeXmlStringToObject(stringArray[0], stringArray[1]);

            Console.WriteLine("Output");
            Console.WriteLine("------");

            #region int
            Console.WriteLine("Object: " + (int)outObj);
            #endregion

            #region string
            //Console.WriteLine("Object: " + (string)outObj);
            #endregion

            #region list
            //string[] tempArray;
            //List list = (List)outObj;

            //foreach (string pair in list)
            //{
            //    tempArray = pair.Split(':');
            //    Console.WriteLine(String.Format("Key:{0} Value:{1}", tempArray[0], tempArray[1]));
            //}
            #endregion

            Console.Read();
        }

        private static string[] SerializeObjectToXmlString(object obj)
        {
            XmlTextWriter writer = new XmlTextWriter(new MemoryStream(), Encoding.UTF8);
            writer.Formatting = Formatting.Indented;
            XmlSerializer serializer = new XmlSerializer(obj.GetType());
            serializer.Serialize(writer, obj);

            MemoryStream stream = (MemoryStream)writer.BaseStream;
            string xmlString = UTF8ByteArrayToString(stream.ToArray());

            string objectType = obj.GetType().FullName;

            return new string[]{xmlString, objectType};
        }

        private static object DeserializeXmlStringToObject(string xmlString, string objectType)
        {
            MemoryStream stream = new MemoryStream(StringToUTF8ByteArray(xmlString));
            XmlSerializer serializer = new XmlSerializer(Type.GetType(objectType));

            object obj = serializer.Deserialize(stream);

            return obj;
        }

        private static string UTF8ByteArrayToString(Byte[] characters)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            return encoding.GetString(characters);
        }

        private static byte[] StringToUTF8ByteArray(String pXmlString)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            return encoding.GetBytes(pXmlString);
        } 


    }
}


+1  A: 

When I've come across this before, it usually had to do with encoding. I'd try specifying the encoding when you serialize your object. Try using the following code. Also, is there any specific reason why you need to return a string[] array? I've changed your methods to use generics so you don't have to specify a type.

private static string SerializeObjectToXmlString<T>(T obj)
{
    XmlSerializer xmls = new XmlSerializer(typeof(T));
    using (MemoryStream ms = new MemoryStream())
    {
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Encoding = Encoding.UTF8;
        settings.Indent = true;
        settings.IndentChars = "\t";
        settings.NewLineChars = Environment.NewLine;
        settings.ConformanceLevel = ConformanceLevel.Document;

        using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
        {
            xmls.Serialize(writer, obj);
        }

        string xml = Encoding.UTF8.GetString(ms.ToArray());
        return xml;
    }
}

private static T DeserializeXmlStringToObject <T>(string xmlString)
{
    XmlSerializer xmls = new XmlSerializer(typeof(T));

    using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(xmlString)))
    {
        return (T)xmls.Deserialize(ms);
    }
}

If you still have problems, try using Encoding.ASCII in your code anywhere you see Encoding.UTF8, unless you have a specific reason for using UTF8. I'm not sure of the cause, but I've seen UTF8 encoding cause this exact problem in certain cases when serializing.

Dan Herbert
Thanks for the feedback! I think I am specifying the encoding when I instantiate the XmlTextWriter, or is that not it? The string array was there to so I could return the xml string and control the type string returned for use in the deserialization.
A: 

This is BOM symbol. You can either remove it

if (xmlString.Length > 0 && xmlString[0] != '<')
{
    xmlString = xmlString.Substring(1, xmlString.Length - 1);
}

Or use UTF32 to serialize

using (StringWriter writer = new StringWriter(CultureInfo.InvariantCulture))
{
    serializer.Serialize(writer, instance);
    result = writer.ToString();
}

And deserialize

object result;
using (StringReader reader = new StringReader(instance))
{
    result = serializer.Deserialize(reader);
}

If you are using this code only inside .Net applications using UTF32 won't create problems as it's the default encoding for everything inside .Net

Sergej Andrejev
Thanks for the response! If this is the BOM, would you know why it sometimes shows as a single leading '?' and sometimes as three '???' ..?
My only guess is that serialized data is read using other encoding than it was created with. To the question of length, this could vary because length of BOM symbol varies between different Unicode encodings.Still I would suggest to stick with UTF-32 if you don't communicate with other systems.
Sergej Andrejev