views:

223

answers:

1

I'm using the following code to serialise an object:

public static string Serialise(IMessageSerializer messageSerializer, DelayMessage message)
{
    using (var stream = new MemoryStream())
    {
        messageSerializer.Serialize(new[] { message }, stream);

        return Encoding.UTF8.GetString(stream.ToArray());
    }
}

Unfortunately, when I save it to a database (using LINQ to SQL), then query the database, the string appears to start with a question mark:

?<z:anyType xmlns...

How do I get rid of that? When I try to de-serialise using the following:

public static DelayMessage Deserialise(IMessageSerializer messageSerializer, string data)
{
    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(data)))
    {
        return (DelayMessage)messageSerializer.Deserialize(stream)[0];
    }
}

I get the following exception:

"Error in line 1 position 1. Expecting element 'anyType' from namespace 'http://schemas.microsoft.com/2003/10/Serialization/'.. Encountered 'Text' with name '', namespace ''. "

The implementations of the messageSerializer use the DataContractSerializer as follows:

public void Serialize(IMessage[] messages, Stream stream)
{
    var xws = new XmlWriterSettings { ConformanceLevel = ConformanceLevel.Fragment };
    using (var xmlWriter = XmlWriter.Create(stream, xws))
    {
        var dcs = new DataContractSerializer(typeof(IMessage), knownTypes);
        foreach (var message in messages)
        {
            dcs.WriteObject(xmlWriter, message);
        }
    }
}

public IMessage[] Deserialize(Stream stream)
{
    var xrs = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
    using (var xmlReader = XmlReader.Create(stream, xrs))
    {
        var dcs = new DataContractSerializer(typeof(IMessage), knownTypes);
        var messages = new List<IMessage>();
        while (false == xmlReader.EOF)
        {
            var message = (IMessage)dcs.ReadObject(xmlReader);
            messages.Add(message);
        }
        return messages.ToArray();
    }
}
+1  A: 

Unfortunately, when I save it to a database (using LINQ to SQL), then query the database, the string appears to start with a question mark:

?<z:anyType xmlns...

Your database is not set up to support Unicode characters. You write a string including a BOM in it, the database can't store it so mangles it into a '?'. Then when you come back to read the string as XML, the '?' is text content outside the root element and you get an error. (You can only have whitespace text outside the root element.)

Why is the BOM getting there? Because Microsoft love dropping BOMs all over the the place, even when they're not needed (and they never are, with UTF-8). The solution is to make your own instance of UTF8Encoding instead of using the built-in Encoding.UTF8, and tell it you don't want its stupid BOMs:

Encoding utf8onlynotasridiculouslysucky= new UTF8Encoding(false);

However, this is only really masking the real issue, which is the database configuration.

bobince
Okay, so what configuration should I change in the database?
Neil Barnwell
Oh, good answer, btw. I had thought I was going a bit mad and I'm not happy with just trimming question marks and things.
Neil Barnwell
No idea... what's the database server? how are you connecting to it? what's in the schema?
bobince
SQL Server, connecting with System.Data.SqlClient. I've tried `Text` and `varchar(max)` as the column datatype for the serialised data.
Neil Barnwell
Aha, I think the usual approach with SQL Server would be to use NVARCHAR [max], which is native-Unicode (stored as UTF-16LE).
bobince