views:

417

answers:

3

When I use XmlReader.ReadOuterXml(), elements are separated by \n instead of \r\n. So, for example, if I have XmlDocument representatino of

<A>
<B>
</B>
</A>

I get

<A>\n<B>\n</B>\n</A>

Is there an option to specify newline character? XmlWriterSettings has it but XmlReader doesn't seem to have this.

Here is my code to read xml. Note that XmlWriterSettings by default has NewLineHandling = Replace

XmlDocument xmlDocument = <Generate some XmlDocument>
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;

// Use a memory stream because it accepts UTF8 characters.  If we use a 
// string builder the XML will be UTF16.
using (MemoryStream memStream = new MemoryStream())
{
    using (XmlWriter xmlWriter = XmlWriter.Create(memStream, settings))
    {
        xmlDocument.Save(xmlWriter);
    }

    //Set the pointer back to the beginning of the stream to be read
    memStream.Position = 0;
    using (XmlReader reader = XmlReader.Create(memStream))
    {
        reader.Read();
        string header = reader.Value;
        reader.MoveToContent();
        return "<?xml " + header + " ?>" + Environment.NewLine + reader.ReadOuterXml();
    }
}
A: 

XmlReader reads files, not writes them. If you are getting \n in your reader it is because that's what's in the file. Both \n and \r are whitespace and are semantically the same in XML, it will not affect the meaning or content of the data.

Edit:

That looks like C#, not Ruby. As binarycoder says, ReadOuterXml is defined to return normalized XML. Typically this is what you want. If you want the raw XML you should use Encoding.UTF8.GetString(memStream.ToArray()), not XmlReader.

Dour High Arch
Dour, I added my code. If I use XmlWriter with NewLineHandling = Replace, shouldn't it write correct string?
+1  A: 

XmlReader will automatically normalize \r\n\ to \n. Although this seems unusual on Windows, it is actually required by the XML Specification (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends).

You can do a String.Replace:

string s = reader.ReadOuterXml().Replace("\n", "\r\n");
binarycoder
A: 

There's a quicker way if you're just trying to get to UTF-8. First create a writer:

public class EncodedStringWriter : StringWriter
{
    public EncodedStringWriter(StringBuilder sb, Encoding encoding)
     : base(sb)
    {
     _encoding = encoding;
    }

    private Encoding _encoding;

    public override Encoding Encoding
    {
     get
     {
      return _encoding;
     }
    }

}

Then use it:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<foo><bar /></foo>");

StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;

using( EncodedStringWriter w = new EncodedStringWriter(sb, Encoding.UTF8) )
{
 using( XmlWriter writer = XmlWriter.Create(w, xws) )
 {
  doc.WriteTo(writer);
 }
}
string xml = sb.ToString();

Gotta give credit where credit is due.

micahtan