views:

378

answers:

2

I'm trying to create a piece of xml. I've created the dataclasses with xsd.exe. The root class is MESSAGE.

So after creating a MESSAGE and filling all its properties, I serialize it like this:

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
StringWriter sw = new StringWriter();
serializer.Serialize(sw, response);
string xml = sw.ToString();

Up until now all goes well, the string xml contains valid (UTF-16 encoded) xml. Now I like to create the xml with UTF-8 encoding instead, so I do it like this:

Edit: forgot to include the declaration of the stream

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
using (MemoryStream stream = new MemoryStream())
{
    XmlTextWriter xtw = new XmlTextWriter(stream, Encoding.UTF8);
    serializer.Serialize(xtw, response);
    string xml = Encoding.UTF8.GetString(stream.ToArray());
}

And here comes the problem: Using this approach, the xml string is prepended with an invalid char (the infamous square).
When I inspect the char like this:

char c = xml[0];

I can see that c has a value of 65279.
Anybody has a clue where this is coming from?
I can easily solve this by cutting off the first char:

xml = xml.SubString(1);

But I'd rather know what's going on than blindly cutting of the first char.

Anybody can shed some light on this? Thanks!

+4  A: 

65279 is the Unicode byte order mark - are you sure you're getting 65249? Assuming it really is the BOM, you could get rid of it by creating a UTF8Encoding instance which doesn't use a BOM. (See the constructor overloads for details.)

However, there's an easier way of getting UTF-8 out. You can use StringWriter, but a derived class which overrides the Encoding property. See this answer for an example.

Jon Skeet
I ran the code and got 65279, too. Probably a typo in the question.
Chris W. Rea
A typo indeed... updated ;-)
fretje
BOM: See http://en.wikipedia.org/wiki/Byte-order_mark
Chris W. Rea
I don't find creating a new class necessarily *easier*... what I would find easier is that I could *set* the Encoding of a StringWriter without having to derive from it.
fretje
@fretje: Yes, but deriving a new class is easier than changing the .NET framework :) And the point about deriving a new class being easier than using XmlTextWriter is that you only have to do it in one place, ever.
Jon Skeet
@Jon: Agreed. I'll take this approach if I ever need this a second time in the same project ;-)
fretje
+2  A: 

Here's your code modified to not prepend the byte-order-mark (BOM):

var serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
Encoding utf8EncodingWithNoByteOrderMark = new UTF8Encoding(false);
XmlTextWriter xtw = new XmlTextWriter(stream, utf8EncodingWithNoByteOrderMark);
serializer.Serialize(xtw, response);
string xml = Encoding.UTF8.GetString(stream.ToArray());
Chris W. Rea
I used this solution, so I accepted this answer. Thanks!
fretje