views:

300

answers:

3

I'm currently serializing an object using XMLSerializer, and the resulting XML starts with:

<?xml version="1.0" encoding="utf-16"?>

Which I would like to get rid of, because in this particular case I don't need it (I'll only use the serialized string to deserialize later with my own code, so I can re-add it later when needed).

I'm also trying to do this as fast as possible, since we'll be doing TONS of these serializations.

So the question is, can I count on this signature to always be exactly the same? (As in, can I just remove the first 39 characters of the resulting string, and then add back that exact same string when deserializing?)

Or can something make the encoding be different, for example?

Thanks

A: 

No, you cannot assume that the XML declaration will always be the same as there are a myriad of different encodings that could have been used (among other things).

It is always better to not mangle an XML string in this manner before using it.

Andrew Hare
Ok, now, what determines what encoding will be used? Can I force it somehow? I haven't found any place to specify it.
Daniel Magliola
I don't understand the question - when are you using the encoding?
Andrew Hare
I'm not using the encoding. My question is: The XML comes out with an XML signature that says the encoding is utf-16. Why is this particular encoding chosen? Is this hard-coded in the framework? Can I change it? Is it dependent on the weather, or the version of the host machine's OS?
Daniel Magliola
A: 

Where exactly would your speed improvement supposedly come from this optimization? Are you certain that removing and adding 39 characters to a string would be faster than serializing an extra 39 characters? (My contention would be that it would not.)

McWafflestix
Oh, no, removing the 39 characters is not a SPEED improvement, at all. I want to remove them to have the resulting string be smaller/cleaner, but it won't make ANYTHING faster. The speed part comes just from trying to not make this run slower, like by running a regex to remove that starting piece. If I just remove the first "n" characters, I can do it within the same StringBuilder I already have it.Truth be told, it may not make a difference at all to use a regex, possibly, I haven't measured it really.
Daniel Magliola
What I mainly want to know is what defines that encoding, so that I can fix it and make sure it won't change.
Daniel Magliola
+2  A: 

The answer to your question is in the code you didn't show us - how you did the serialization. You probably serialized to a StringWriter, or directly to a StringBuilder. Strings in .NET are UTF-16. If you serialize to a string, you have no choice but to get UTF-16 encoding.

In other situations, the encoding is dictated by the destination. If you serialize to a TextWriter of some sort, then the encoding of the TextWriter will be used, unless overridden. If you serialize to an XmlWriter, then the XmlWriterSettings will determine the encoding used.

I recommend that you leave the signature alone, unless you're an expert in XML. The .NET XML APIs understand the rules of XML. Unless you understand them just as well, I recommend you leave it to the expert.

John Saunders
Thank you. That explains what I wanted to know. I'll leave them alone :-)
Daniel Magliola