We're using DataContractSerializer
to serialize our data to XML. Recently we found a bug with how the string "\r\n"
gets saved and read back - it was turned into just "\n"
. Apparently, what causes this is using an XmlWriter
with Indent = true
set:
// public class Test { public string Line; }
var serializer = new DataContractSerializer(typeof(Test));
using (var fs = File.Open("C:/test.xml", FileMode.Create))
using (var wr = XmlWriter.Create(fs, new XmlWriterSettings() { Indent = true }))
serializer.WriteObject(wr, new Test() { Line = "\r\n" });
Test test;
using (var fs = File.Open("C:/test.xml", FileMode.Open))
test = (Test) serializer.ReadObject(fs);
The obvious fix is to stop indenting XML, and indeed removing the "XmlWriter.Create
" line makes the Line
value roundtrip correctly, whether it's "\n"
, "\r\n"
or anything else.
However, the way DataContractSerializer
writes it still doesn't seem to be entirely safe or perhaps even correct - for example, just reading the resulting file with XML Notepad and saving it again destroys both "\n"
and "\r\n"
values completely.
What is the correct approach here? Is using XML as a format for serializing binary data a flawed concept? Are we wrong to expect that tools like XML Notepad won't break our data? Do we need to augment each and every string
field that could contain such text with some special attribute, perhaps something to force CDATA?