tags:

views:

735

answers:

3

Currently I'm writing XHTML in a XmlDocument. This works perfect, but I'm stuck on one problem. Some XmlText elements can contain things like  . When I want to write such things to a stream it uses the innerXML instead of the innerText value for such nodes. The problem is that the ouput is wrong because now its outputting   instead of  . How can I use xmlwriter and xmldocument without performing such escaping when writing to a stream? I just want unescaped output.

+1  A: 

If you use XmlWriter.WriteRaw, it won't perform any escaping - it assumes you've got raw XML.

For example:

using System;
using System.Xml;

class Test
{
    static void Main()
    {
        using (XmlWriter writer = XmlWriter.Create(Console.Out))
        {
            writer.WriteStartDocument();
            writer.WriteStartElement("root");
            writer.WriteRaw("<element>&nbsp;</element>");
            writer.WriteEndElement();
            writer.WriteEndDocument();
        }
    }
}

Output:

<?xml version="1.0" encoding="IBM437"?><root><element>&nbsp;</element></root>
Jon Skeet
Is this also possible with the XmlDocument.Save routine? I don't want to walk the DOM tree myself, because the DOM XML tree is coming from an interpreter which generates those tree's.
A: 

Assuming you are using .NET 3.x ,learn and use LINQ-to-XML... the API is very simple and more capable. That way you need not walk/traverse the DOM...instead you can just query the object tree.

Specifically, look into the XDocument clas of the API.

+1  A: 

You're almost certainly trying to solve the wrong problem here. If you want text with non-breaking spaces, then you should use the non-breaking space character. In a C# string literal you can write it as the escape sequence \u00A0, for example:

     var xmldoc = new XmlDocument();
     XmlElement test = xmldoc.CreateElement("test");
     xmldoc.AppendChild(test);
     XmlText nbsp = xmldoc.CreateTextNode("\u00A0");
     test.AppendChild(nbsp);

HTML entities like nbsp are just a way to encode such characters in a non-unicode text file. You shouldn't be using them when constructing an XML DOM. By the way, if you force .NET to write the above DOM to an ASCII encoded file (via the proper XmlWriterSettings) then it will probably write the non-breaking space character as &#xA0;. In an UTF-8 encoded file (the default) it will just appear as a space.

If you force certain literal character sequences to appear in the XML output, then you risk creating invalid XML that cannot be loaded by conforming XML processors. For example, try to load <test>&nbsp;</test> in an empty XmlDocument. This will throw an exception. To be fair, you can declare such entities, and the XHTML schema does so. But I hope you see my point.

edit: XmlDocument is doing it's job correctly. If it wouldn't escape characters such as & < > then you could create invalid XML that's impossible to load again. To force an XML entity in the output you should use XmlDocument.CreateEntityReference. The bug is in whatever code is using entities in XmlText nodes instead of generating XmlEntityReference nodes.

Wim Coenen
I think this solution is not suitable for me. I can't force the input to use that format you're suggesting, because its coming from a parser. I also try'd your solution with the writer, but that won't work. I think the problem is not the writer itself but the XmlDocument which already performs this action. When I look through the contents of innerXml of the document, the escaping is already performed there. I can't use the innerText, because I also need the XHTML tags. The non-breaking space was just an example of the problem, because the problem is appearing with every type of HTML encoding.
@Captain007 What parser are you using?
Alohci
A handmade parser for a domain-specific language. In the domain-specific language it is allowed to use escaped and unescaped characters together.